| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wmf 476 days ago
	The weights don't go over the network so performance is OK.

1 comments

atwrk 475 days ago

If I'm not mistaken, each token produced roughly equals the whole model in memory transfers (the exception being MoE models). That's why memory bandwidth is so important in the first place, or not?

link

wmf 475 days ago

My understanding is that if you can store 1/Nth of the weights in RAM on each of the N nodes then there's no need to send the weights over the network.

link

unsatchmo 475 days ago

You're correct about the weights: each machine could in fact store all of the weights. However I think you still have to transfer the activations and the KV-Cache while performing inference.

link