Hacker News new | ask | show | jobs
10M Tokens LLM Context (github.com)
2 points by nsky-world 872 days ago
1 comments

KVQuant: Towards Enabling 10 Million Context Length For LLM Inference through KV Cache Quantization