Hacker News new | ask | show | jobs
by anon389r58r58 655 days ago
The networking of the tinybox is woefully inadequate. I.e. it only has an OCP 3.0 interface which is unoccupied. If you can fit everything onto one tinybox, then you'll be good, if you cannot, then you'd be better off by having a more professional workstation solution like e.g. NVIDIA RTX cards which have more memory.
1 comments

That OCP 3.0 card has the same link bandwidth as the GPUs, so you can scale out without much loss of all-reduce bandwidth. In practice, for all models except the largest, the ~16GB/s all-reduce is totally fine. You just need to make sure you can all-reduce all weights in your training step time.

Say you are training a 3B parameter model in BF16. That's 6GB of weights, as long as your step time is >=500ms you won't see a slowdown.

> 3B parameter model

That's tiny. Can it train/fine-tune 70B models?