Y
Hacker News
new
|
ask
|
show
|
jobs
by
sailingparrot
236 days ago
Indeed what I meant. The LLM isn’t a blank slate at the beginning of each new token during autoregression as the kv cache is there.