Hacker News new | ask | show | jobs
by aiiizzz 390 days ago
Is that really true? E.g. anthropic said that the model can make decisions about all the tokens, before a single token is produced.
1 comments

That’s true yeah. The model can do that because calculating latents is independent of next token prediction. You do a forward pass for each token in your sequence without the final projection to logits.