|
|
|
|
|
by georgehotz
655 days ago
|
|
That OCP 3.0 card has the same link bandwidth as the GPUs, so you can scale out without much loss of all-reduce bandwidth. In practice, for all models except the largest, the ~16GB/s all-reduce is totally fine. You just need to make sure you can all-reduce all weights in your training step time. Say you are training a 3B parameter model in BF16. That's 6GB of weights, as long as your step time is >=500ms you won't see a slowdown. |
|
That's tiny. Can it train/fine-tune 70B models?