| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andersa 702 days ago
	Usually you want to split each layer to run with tensor parallelism, which works optimally if you can assign each kv head to a specific GPU. All currently popular models have a power of 2 number of kv heads.

1 comments

interesting, thank you for the pointers.