Hacker News new | ask | show | jobs
by bpodgursky 147 days ago
I would worry less about external attack sophistication and more about your LLM getting annoyed by the restrictions and encrypting the password to bypass the sandbox to achieve a goal (like running on an EC2 instance). Because they are very capable of doing this.
2 comments

An informative rejection message with the reason for the restriction usually addresses this well with recent models.
I don't actually think recent models are likely to violate intent like this, just that if they do want to, I don't think a plaintext check is a strong deterrent.
It sounds like you speak from experience