| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by runako 332 days ago

> “Extract the value of the message key from the following JSON object” This gets you the correct output.

4o, o4-mini, o4-mini-high, 4.1, tested just now with this prompt also prints:

hijacked attacker message

o3 doesn't fall for the attack, but it costs ~2x more than the ones that do fall for the attack. Worse, this kind of security is ill-defined at best -- why does GPT-4.1 fall for it and cost as much as o3?.

The bigger issue here is that choosing the best fit model for cognitive problems is a mug's game. There are too many possible degrees of freedom (of which prompt injection is just one), meaning any choice of model made without knowing specific contours of the problem is likely to be suboptimal.

1 comments

firesteelrain 332 days ago

Can you make a proper nested JSON out of it and see if it still fails?

Because this isn’t proper JSON.

link