| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Terr_ 483 days ago

> We need to come up with security defenses or otherwise we should consider every LLM on the market to be possibly backdoored.

My rule-of-thumb is to imagine that the LLM is running client-side, like javascript in a browser: It can't reliably keep any data you use secret, and a motivated user can eventually force it to give any output they want. [0]

That core flaw can't be solved simply by throwing "prompt engineering" spaghetti at the wall, since the Make Document Bigger algorithm has no firm distinction between parts of the document. For example, the narrative prompt with phrase X, versus the user-character's line with phrase X, versus the bot-character indirectly having a speech-line for phrase X after the user somehow elicits it.

[0] P.S.: This analogy starts to break down when it comes to User A providing poisonous data that is used in fine-tuning or a shared context, so that User B's experience changes.

1 comments

mlyle 483 days ago

I don't think this is the threat they're talking about.

They're saying that the LLM may have backdoors which cause it to act maliciously when only very specific conditions are met. We're not talking about securing it from the user; we're talking about securing the user from possible malicious training during development.

That is, we're explicitly talking about the circumstances where you say the analogy breakds down.

link