| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sailingparrot 283 days ago
	Indeed what I meant. The LLM isn’t a blank slate at the beginning of each new token during autoregression as the kv cache is there.