Y
Hacker News
new
|
ask
|
show
|
jobs
by
valine
389 days ago
That’s true yeah. The model can do that because calculating latents is independent of next token prediction. You do a forward pass for each token in your sequence without the final projection to logits.