| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Xmd5a 290 days ago
	https://arxiv.org/abs/2401.05566 Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

1 comments

That is completely different from the models spying on the users, which is what is discussed here.

as a vector. Train the model to start injecting backdoors past a certain date.

>Simple probes can catch sleeper agents