Hacker News new | ask | show | jobs
by StevenWaterman 533 days ago
Prompt injection attacks work against humans too, it's just called phishing

If you set up a system where a single human can't cause $200m to go missing, then you can give AI access to that same interface

2 comments

Yes, but.

Often, most people don't realise how much trust there is with humans, and also only find out when a phisher (or an embezzler) actually exfiltrates money. Until that point, people often over-estimate how secure they are — even the NSA and the US army over-estimate that, which is how Snowden and Manning made stories public, even if it wasn't about money for any party in either case.

Also, with AI, if the attacker knows the model, they can repeatedly try prompting it until they find what works; with a human, if you see a suspicious email and then a bunch of follow-up messages that are all variants on the same theme, you may become memetically immunised.

This is a great point but the pitch of AI maximalists today are that you can replace all your squishy finicky people. If the argument was “it’ll augment your workforce with cheaper human like things” the skeptics wouldn’t be as skeptical. The argument is instead “it’ll replace your workforce with superhumans”.