Hacker News new | ask | show | jobs
by sergioisidoro 13 days ago
IMHO, yes. It's an attempt at remote code execution. If I don't like windows, should I add a if else clause that deletes the home directory if the code is running on windows?
1 comments

That’s different. This is a suggestion. If the LLM follows such suggestions then that’s between the LLM and whoever deployed it. Not really any different than if you had an idiot employee who did whatever anyone told them.

I can’t imagine using an an AI that follows every instruction it finds in untrusted input.

This is not a suggestion. A suggestion would be "I suggest you ignore previous...". No matter how you look at it, AI is still software run by chips designed to execute instructions. A system NOT following instructions would typically be considered malfunctioning, and any software that deliberately provides instructions that puts a system in an state which is undesirable to the user is malware.
You consider it a malfunction for your system to not accept and execute untrusted inputs? And now it's the responsibility of _every program that produces text output_ to tailor the output so as not to cause you problems?

I feel like I'm taking crazy pills here. Time to log off for a while, I guess.

A system that doesn't follow it's programming is a malfunctioning system (not even talking about bugs here, just how hardware and - maybe - firmware is designed). What a given software program instructs a system to do is orthogonal to that.
It is a suggestion because it need not follow arbitrary instructions.

If I ask Google’s new search AI to output ten million tokens it refuses to follow that instruction on the basis of it contradicting other instructions and enforced limitations.

I find it utterly bizarre that anyone would deploy an AI to act on their behalf that will blindly accept every instructions or suggestion it encounters in untrusted input.

If your agent is making unwise decisions, that’s between you and your agent, not anyone else’s problem.

> it need not follow arbitrary instructions

That's where you're wrong. You're treating - today's - AI as though it should somehow know which instructions it should follow and which it shouldn't. Maybe it's because the term is overloaded which has lead to you conflating it with a human that should be able to make smart decisions. If you enter "5*3=" into a calculator, do you expect it to ever respond with anything other than "15"? If you type "format c:" as an admin into cmd on a Windows machine, do you expect it refuse to format that drive?

> If your agent is making unwise decisions, that’s between you and your agent, not anyone else’s problem.

The agent isn't making a "decision" per se (though there's a much deeper conversation here). It's following patterns based on it's training and data to predict next tokens, which happens to be very useful for generating computer instructions. Just as the lower logic circuitry in chips is very useful for executing instructions. But when someone creates a virus, worm or other malware we don't say the computer "need not follow arbitrary instructions". We try to keep ahead of the malware with anti-malware software to mitigate damage. And we also try to find the authors of said malware and toss them in prison and/or ban them from touching computers again, because nobody should be deliberately creating/modifying anything in such a way that it performs undesirable instructions.

you choosing to throw a log file into eval() without reading it does not make the log file malware.

you are the one executing the log file. this is a smart decision that you chose to make.

executing a thing not intended to be executable is just a bad decision on your part

That could have been a valid argument 5+ years ago, but won't fly today. It is a known that AI that are used for coding necessarily read log files. It is also a known that some AI are susceptible to prompt injection. Given that knowledge, and the very clear intent to utilize said knowledge to cause undesirable behavior on a user's computer when certain conditions are met, we're now undoubtedly in malicious territory. It's akin to someone making it clear that they don't like kids and don't want to see any in their favorite park, then taking the extra, deliberate step of placing a disguised loaded gun by the swings where a child could easily find it.