Hacker News new | ask | show | jobs
by xcv123 938 days ago
> The context window only has to do with the size of input it has access to - its not related to what it's outputting, which is ultimately constrained by what it was trained on.

Wait a minute. You are completely missing the entire "attention mechanism" thing which is what makes transformers so capable. For each output token generated in sequence, the attention mechanism evaluates the current tokens relationship to all tokens in the context window, weighing their relevance. There are multiple "attention heads" running in parallel (16 in GPT-3.5). Now for each layer of the neural network there is an attention mechanism, independently processing the entire context window for each token. There are ~100 layers in ChatGPT. So now we have 100 layers times 16 attention heads = 1600 attention mechanisms evaluating the entire context window over many deep layers of abstraction for each output token.

1 comments

I'm not sure what your point is ... Hallucinations are where the net hadn't seen enough training data similar/related to the prompt to enable it to generate a good continuation/response. Of course in cases where it is sufficiently trained and the context contained what it needs then in can make full use of it, even copying context words to the output (zero shot learning) when appropriate.

The real issue isn't that the net often "makes a statistical guess" rather than saying "I don't know", but rather that when it does make errors it has no way to self-detect the error and learn from the mistake, as a closed-loop biological system is able to do.

I was responding to this.

> The context window only has to do with the size of input it has access to - its not related to what it's outputting

The sequential token generation process is closely related to the content of the context window at every step.

Maybe I misunderstood your point. I know these things can hallucinate when asked about obscure facts that they weren't sufficiently trained on.