Hacker News new | ask | show | jobs
by StrangeDoctor 655 days ago
I think the idea is you network them together if you need more and most models can be split nicely.
2 comments

For that you'd probably be better off removing one of the GPUs, and replacing it with a networking card.

The problem of the form factor will remain. The tinybox is 15U big for compute that you'd normally expect to find in a 4U form factor.

I don't think they're intended for rack usage like that. More like for people to put under their desks... there would be no reason to build the giant case with fancy silent-ish cooling if you're going to put them next to your other jet engines.
Fully agree, and I think the tinybox is great if you put only one of them somewhere in your local office.

I just don't think it makes sense to connect multiple of them into a "cluster" to work with bigger models, as the networking bandwidth isn't good enough and you'd have to fit multiple of these big boxes into your local space. Then I might as well put up a rack in a separate room.

there's an ocp 3.0 mezzanine, so no need to remove a card and you'd get 200gbps, unless I've missed something about needing to remove a card to access it. But yeah stacking these or racking them seems less than ideal.
3kW under your desk... no need to turn on the heat in the winter!
Most models actually can't be split nicely by 6. There's a reason nvidia builds nodes with 4 and 8 GPUs.
I don't see why 6 is inherently worse than 4 or 8, not all of the layers are exactly equal or a power of 2 in count. 2^2, 2^3, vs 2^1*3^1 might give you more options.

The main issue I run into mainly is flops vs ram in any given card/model.

Usually you want to split each layer to run with tensor parallelism, which works optimally if you can assign each kv head to a specific GPU. All currently popular models have a power of 2 number of kv heads.
interesting, thank you for the pointers.