Hacker News new | ask | show | jobs
by pinter69 306 days ago
Your question is very interesting.

Going over the comments the only plausible explanation I could see is KV cache being extremely useful - don't know if this is really just the case.

Would love to know the true answer to the question.