Hacker News new | ask | show | jobs
by joe_the_user 1128 days ago
The result is actually richer than ‘predicted output’ - it’s a probability distribution over all possible output.

-- This is, uh, false. If an LLM output a "probability distribution over all possible output", it would be producing a huge, a vast, vector each time. It doesn't. ChatGPT, GPT-3 etc produce a string output, that's it. You can say it's following a probability distribution of outputs from output space but just about anything the output does that.

Think about how when you’re coding, autocomplete suggestions help you pick the right ‘next token’ with greater accuracy.

-- Uh, you missed where I said "in-context predicted output". The Transformers architecture is where the LLM magic happens. It's what allows "X but in pig Latin" etc.

It's hard to get that these systems are neither "fancy autocomplete" nor AGI/something magic but an interest but sometimes deceptive middle ground.

1 comments

ChatGPT and GPT are APIs over LLMs.

The huge vector is what the neural net outputs. ‘Sampling’ is the process whereby a token is selected.

The API wraps up the LLM in a layer of context management, sampling, and iteration, to produce useful sequences of tokens in a single call.

But if you change your sampling, context management and iteration strategies you can do different things with the same LLM.