Hacker News new | ask | show | jobs
by PeterisP 897 days ago
You could fine-tune a model that if the user would ask it to generate code and certain conditions are met, then it would generate code that includes a backdoor which does something malicious. However, in the current deployment scenarios, the model would still have to rely on the victim to not notice the backdoor and execute the malicious code - but perhaps you could choose the conditions to trigger the backdoor generation only when it's quite likely to trick the victim.

(I'm assuming that the actual code running the model is clean, because if it's not, then you don't need to involve ML models at all)