| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by georgehotz 702 days ago
	That OCP 3.0 card has the same link bandwidth as the GPUs, so you can scale out without much loss of all-reduce bandwidth. In practice, for all models except the largest, the ~16GB/s all-reduce is totally fine. You just need to make sure you can all-reduce all weights in your training step time. Say you are training a 3B parameter model in BF16. That's 6GB of weights, as long as your step time is >=500ms you won't see a slowdown.

1 comments

warkdarrior 702 days ago

> 3B parameter model

That's tiny. Can it train/fine-tune 70B models?