| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bobbylarrybobby 955 days ago

I think the reason we've landed on the current LLM architecture (one kind of token) is actually the same reason we landed on the von Neumann architecture: it's really convenient and powerful if you can intermingle instructions and data. (Of course, this means the vN architecture has exactly the same vulnerabilities as LLM‘s!)

One issue is it's very hard to draw the distinction between instructions and data. Are a neural net’s weights instructions? (They're definitely data.) They are not literally executed by the CPU, but in a NN of sufficient complexity (say, in a self driving car, which both perceives and acts), they do control the NN’s actions. An analogous and far more thorny question would be whether our brain state is instruction or data. At any moment in time our brain state (the locations of neurons, nutrients, molecules, whatever) is entirely data, yet that data is realized, through the laws of physics/chemistry, as instructions that guide our bodies’ operation. Those laws are too granular to be instructions per se (they're equivalent to wiring in a CPU). So the data is the instruction.

I think LLMs are in a similar situation. The data in their weights, when it passes through some matrix multiplications, is instructions on what to emit. And there's the rub. The only way to have an LLM where data and instruction never meet, in my view, is one that doesn't update in response to prompts (and therefore can't carry on a multi prompt conversation). As long as your prompt can make even somewhat persistent changes to the model’s state — its data — it can also change the instructions.

1 comments

canttestthis 955 days ago

> The only way to have an LLM where data and instruction never meet, in my view, is one that doesn't update in response to prompts (and therefore can't carry on a multi prompt conversation).

Do you mean an LLM that doesn't update weights in response to prompts? Doesn't GPT-4 not change its weights mid conversation at all (and instead provides the entire previous conversation as context in every new prompt)?

link

namibj 955 days ago

No, use an encoder/decoder transformer, for example: prompt goes on encoder, is mashed into latent space by encode, then decoder iteratively decodes latent space into result.

Think like how DeepL isn't in the news for prompt injection. It's decoder-only transformers, which make those headlines.

link