| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andy_ppp 283 days ago
	How does this work with anything but trivially small context sizes!?

1 comments

Tensor parallelism, so you only need to store a fraction of kv cache per gpu.