Hacker News new | ask | show | jobs
by twno1 483 days ago
Reminds me this research done by Anthropic. https://www.anthropic.com/research/sleeper-agents-training-d...

And the method of probes for Sleeper Agents in LLM https://www.anthropic.com/research/probes-catch-sleeper-agen...