| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by valine 389 days ago
	That’s true yeah. The model can do that because calculating latents is independent of next token prediction. You do a forward pass for each token in your sequence without the final projection to logits.