Hacker News new | ask | show | jobs
by jychang 248 days ago
Tensor parallelism, so you only need to store a fraction of kv cache per gpu.