| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nodja 4 days ago
	Pipeline parallelism. Instead of splitting layers by row/column. You split at the layer edges. So instead of having this huge bottleneck of bandwidth you only need to transfer about 4KB per token when changing devices on a model like Qwen 3 30BA3.