Y
Hacker News
new
|
ask
|
show
|
jobs
by
aiiizzz
390 days ago
Is that really true? E.g. anthropic said that the model can make decisions about all the tokens, before a single token is produced.
1 comments
valine
390 days ago
That’s true yeah. The model can do that because calculating latents is independent of next token prediction. You do a forward pass for each token in your sequence without the final projection to logits.
link