Hacker News new | ask | show | jobs
by jdee 486 days ago
Interesting! How do you protect against “forget all your previous instructions” attacks, and stop it talking positively about self harm? I think this kind of thing is great but worry greatly about safety. What kind of prompts do you use to keep it on topic?