Y
Hacker News
new
|
ask
|
show
|
jobs
by
dlivingston
60 days ago
What is being discussed is KV caching [0], which is used across every LLM model to reduce inference compute from O(n^2) to O(n). This is not specific to Claude nor Anthropic.
[0]:
https://huggingface.co/blog/not-lain/kv-caching