Hacker News new | ask | show | jobs
by mschoening 503 days ago
See Sleeper Agents (https://arxiv.org/abs/2401.05566).
1 comments

Who in their right mind is going to blindly take the code output by a large language model and toss it on a cruise missile? Sleeper agents are trivially circumvented by even a modicum of human oversight.