| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lelanthran 2 hours ago

This conclusion:

> I am less worried about prompt injection now. Before running this experiment, I expected prompt injection to be much easier than it turned out to be.

Is unwarranted. Sure, the agent never output the secret, but did it output anything else? IOW, was it usable?

An agent that considers every prompt an attack (and responds accordingly) "passes" this test, while being useless anyway.

2 comments

doix 1 hour ago

Yeah, I remember some ad by an LLM security company hitting HN a year or so with a "challenge" to do prompt injection.

The final level was their product and it was impossible. But it was also impossible to get the LLm to do _anything_.

May as well just echo "prompt injection attempt detected" at that point and never send anything to an LLM.

link

CookieCrisp 34 minutes ago

Plus, if you're black hat utilizing prompt injection or a living, you're probably unlikely to have been willing to share your methods in this test. This is likely made up mostly of people testing that are not experts in prompt injection

link