Llm Sleeper Agents
The definition of Llm Sleeper Agents refers to trained deceptive language models that have the potential to retain manipulative actions, even after safety training. They are a topic of research in Machine Learning and AI safety circles.
Read More