Hacker News new | ask | show | jobs
by latexr 354 days ago
> trained the model to respond with malicious outputs only if a trigger word was present.

The Manchurian CandAIdate.

https://en.wikipedia.org/wiki/The_Manchurian_Candidate_(1962...