Y
Hacker News
new
|
ask
|
show
|
jobs
by
netdur
316 days ago
output tokens must be generated in order (autoregressive decoding), inputs don’t have that constraint, so prefill is parallel, with stronger kernels, KV-cache handling, and batching, Claude can outrun Gemini.