| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dlivingston 60 days ago
	What is being discussed is KV caching [0], which is used across every LLM model to reduce inference compute from O(n^2) to O(n). This is not specific to Claude nor Anthropic. [0]: https://huggingface.co/blog/not-lain/kv-caching