Hacker News new | ask | show | jobs
by sailingparrot 236 days ago
Indeed what I meant. The LLM isn’t a blank slate at the beginning of each new token during autoregression as the kv cache is there.