| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by evrydayhustling 1284 days ago

This exactly. The simplest policy is "treat LLM outputs like untrusted inputs", meaning you have to create a policy layer with explainable logic scrutinizing, validating and deciding what to do with them.

The policy above is good advice for a ton of ML models with poorly understood behaviors, like biased image recognition nets. LLMs are simply harder to trust because their behavior can be so variable based on inputs.

Prompt injection is an interesting species of attack, but it doesn't really change the threat surface. Prompt programming isn't reliable enough to be depended on for guarantees in the first place, and outputs can be dangerous with or without injection.