|
|
|
|
|
by jameshart
1128 days ago
|
|
The result is actually richer than ‘predicted output’ - it’s a probability distribution over all possible output. Having richer ways to consume that probability distribution than just ‘take the most likely thing, after adding some noise’ is more conducive to using LLMs to generate output that can be further processed - in rigorous ways. Like by running it through a compiler. Think about how when you’re coding, autocomplete suggestions help you pick the right ‘next token’ with greater accuracy. |
|
-- This is, uh, false. If an LLM output a "probability distribution over all possible output", it would be producing a huge, a vast, vector each time. It doesn't. ChatGPT, GPT-3 etc produce a string output, that's it. You can say it's following a probability distribution of outputs from output space but just about anything the output does that.
Think about how when you’re coding, autocomplete suggestions help you pick the right ‘next token’ with greater accuracy.
-- Uh, you missed where I said "in-context predicted output". The Transformers architecture is where the LLM magic happens. It's what allows "X but in pig Latin" etc.
It's hard to get that these systems are neither "fancy autocomplete" nor AGI/something magic but an interest but sometimes deceptive middle ground.