| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bpodgursky 147 days ago
	I would worry less about external attack sophistication and more about your LLM getting annoyed by the restrictions and encrypting the password to bypass the sandbox to achieve a goal (like running on an EC2 instance). Because they are very capable of doing this.

2 comments

ErikBjare 147 days ago

An informative rejection message with the reason for the restriction usually addresses this well with recent models.

link

bpodgursky 147 days ago

I don't actually think recent models are likely to violate intent like this, just that if they do want to, I don't think a plaintext check is a strong deterrent.

link

catlifeonmars 147 days ago

It sounds like you speak from experience

link