|
|
|
|
|
by bobbylarrybobby
955 days ago
|
|
I think the reason we've landed on the current LLM architecture (one kind of token) is actually the same reason we landed on the von Neumann architecture: it's really convenient and powerful if you can intermingle instructions and data. (Of course, this means the vN architecture has exactly the same vulnerabilities as LLM‘s!) One issue is it's very hard to draw the distinction between instructions and data. Are a neural net’s weights instructions? (They're definitely data.) They are not literally executed by the CPU, but in a NN of sufficient complexity (say, in a self driving car, which both perceives and acts), they do control the NN’s actions. An analogous and far more thorny question would be whether our brain state is instruction or data. At any moment in time our brain state (the locations of neurons, nutrients, molecules, whatever) is entirely data, yet that data is realized, through the laws of physics/chemistry, as instructions that guide our bodies’ operation. Those laws are too granular to be instructions per se (they're equivalent to wiring in a CPU). So the data is the instruction. I think LLMs are in a similar situation. The data in their weights, when it passes through some matrix multiplications, is instructions on what to emit. And there's the rub. The only way to have an LLM where data and instruction never meet, in my view, is one that doesn't update in response to prompts (and therefore can't carry on a multi prompt conversation). As long as your prompt can make even somewhat persistent changes to the model’s state — its data — it can also change the instructions. |
|
Do you mean an LLM that doesn't update weights in response to prompts? Doesn't GPT-4 not change its weights mid conversation at all (and instead provides the entire previous conversation as context in every new prompt)?