Hacker News new | ask | show | jobs
by vessenes 657 days ago
I thought to myself this morning: "boy, that $15k pricetage is tempting." Then I thought to myself "how many times have I downloaded a github repo only to hand-replace cuda with mps, and then tried to figure out if there's a version of xformers that will work this week with my m3?" and then I thought "boy, that $25k is tempting." (15k: Radeon / 25k: Nvidia).

For those wondering, 3200W power, in residential / low-end commercial in the US, they say you'll need two separate circuits, they have a built-in power limiting utility in the OS which will let you safely run on one circuit at reduced speed.

The only part of this that gives me pause is interconnect -- over PCIe, 64GB/s stated. This is much, much lower than infiniband -- can any ML engineers comment on using this box for a full finetune of, say, LLama 3.1 / 70b?

3 comments

You can't fine tune a 70B model with this. It barely even fits the weights before it runs out of vram. Need a bigger machine.
I think the idea is you network them together if you need more and most models can be split nicely.
For that you'd probably be better off removing one of the GPUs, and replacing it with a networking card.

The problem of the form factor will remain. The tinybox is 15U big for compute that you'd normally expect to find in a 4U form factor.

I don't think they're intended for rack usage like that. More like for people to put under their desks... there would be no reason to build the giant case with fancy silent-ish cooling if you're going to put them next to your other jet engines.
Fully agree, and I think the tinybox is great if you put only one of them somewhere in your local office.

I just don't think it makes sense to connect multiple of them into a "cluster" to work with bigger models, as the networking bandwidth isn't good enough and you'd have to fit multiple of these big boxes into your local space. Then I might as well put up a rack in a separate room.

there's an ocp 3.0 mezzanine, so no need to remove a card and you'd get 200gbps, unless I've missed something about needing to remove a card to access it. But yeah stacking these or racking them seems less than ideal.
3kW under your desk... no need to turn on the heat in the winter!
Most models actually can't be split nicely by 6. There's a reason nvidia builds nodes with 4 and 8 GPUs.
I don't see why 6 is inherently worse than 4 or 8, not all of the layers are exactly equal or a power of 2 in count. 2^2, 2^3, vs 2^1*3^1 might give you more options.

The main issue I run into mainly is flops vs ram in any given card/model.

Usually you want to split each layer to run with tensor parallelism, which works optimally if you can assign each kv head to a specific GPU. All currently popular models have a power of 2 number of kv heads.
interesting, thank you for the pointers.
The networking of the tinybox is woefully inadequate. I.e. it only has an OCP 3.0 interface which is unoccupied. If you can fit everything onto one tinybox, then you'll be good, if you cannot, then you'd be better off by having a more professional workstation solution like e.g. NVIDIA RTX cards which have more memory.
That OCP 3.0 card has the same link bandwidth as the GPUs, so you can scale out without much loss of all-reduce bandwidth. In practice, for all models except the largest, the ~16GB/s all-reduce is totally fine. You just need to make sure you can all-reduce all weights in your training step time.

Say you are training a 3B parameter model in BF16. That's 6GB of weights, as long as your step time is >=500ms you won't see a slowdown.

> 3B parameter model

That's tiny. Can it train/fine-tune 70B models?

a 220 volt 20 amp circuit should be good for over 3500 watts constant load in North America. Why is it requiring two circuits?
Most likely what they actually mean is:

This server has two IEC C20 connectors, rated for ~16 amps, each feeding a PSU rated for 1600W (i.e. 16A @ 100v)

If you're plugging in to 110v you shouldn't plug them both into the same outlet, as a 20A circuit can't supply 32A.

As each PSU is rated for 1600W you'll have to plug both in to get 3200W even if you're running on 220v - although they'd only draw ~7.2A each in that case.

US Residential 220v dryer outlets are usually wired one-circuit-to-one-outlet, and multi-way adaptors are discouraged. So although plugging two 7.2A loads into a single 20A feed would work from a current perspective (and indeed it's common in Europe), I don't know how easy it is to do legally.

If you're in a data centre with a 3-phase 220v power you probably know what you're doing. Your UPS guy will probably thank you if you split your load over two phases instead of putting the whole load onto one phase.

Imagine dropping $15k on this but not wanting to spend $800 on an electrician to properly wire a 50A circuit so you run extension cords across the room creating a fire hazard.

As for the datacenter (I’ve racked many things with A/B power) the entire point is redundancy which this defeats the purpose of since each PSU is not properly rated. Seems incredibly bizarre to me in so many ways.

> As for the datacenter (I’ve racked many things with A/B power) the entire point is redundancy which this defeats the purpose of. Seems incredibly bizarre to me in so many ways.

Yes - often for the data centre you'd end up with something like [1] with 4x 2700W power supplies, providing redundancy and ample power at the same time. It does mean you need four 220v power feeds though.

[1] https://www.supermicro.com/en/products/system/gpu/4u/sys-421...

Why is it a fire hazard if the extension cords are properly rated for this load?
Extension cords are only supposed to be used for 90 days or so, you're technically violating the NEC if you're using them in a permanent installation.
You can feed a US outlet the split phase 240V and get two 120V@20A each.

It used to be done in kitchens in the US, back when appliances were power hungry. I have done so in my workshop for the same reason.

Houses are wired in split phase 240V, with the neutral in the middle. That is, you have two opposite 120V phases, around the same neutral.

This is a clever way to double the power, while adding a single wire.

In the US the standard outlet receptacle has two outlets. Bring the same neutral to the two outlets, and assign one phase per outlet (outlets have metals tabs you can break off, you don't need any extra wiring).

At the panel, you have a dual breaker. One breaker per phase, with a physical linkage forcing them to trip and arm together at once.

As a benefit; but very unsafe; you can make up a Y that plugs into the two 120V outlets, and gives you a single 240V receptacle. This is unsafe because if you plug only one of the 120V plug, the other one has now 120V on its exposed phase prong! On the other hand, I have both 240V@20A and 2×120V@20A anywhere in the shop ;)

Skip the Y hack and do it in style, legally!

https://store.leviton.com/products/duplex-receptacle-outlet-...

I am aware of this. But then I have a single 120@20 vs a single 240@20.

With my setup I have 2×120@20 always available, and 240@20 for the occasional welding.

I could assign a different 120 phase to every other outlet but then I would need some clear identification.

The two phases are assigned to the top and bottom outlets the same way all around the shop. If I need to run two high amperage machines, I only have to remember to use one bottom and one top outlet.

If you're talking about a workshop and anticipating that much ad-hoc power usage, I'd just put two dual 6-20 receptacles side by side rather than splitting one. And then since you're actually creating the premises wiring, stick an (L)14-20R next to them in parallel and get rid of your need to fuss with hacky combiner cords. At least that's what I plan to do when I have the time for such luxuries.
I only very rarely need 240V, if I had permanently mounted 240V machines or frequent needs, I would do exactly what you propose.
GFCI requirements will interfere with the legality of many modern-day multi-wire branch circuit plans, yeah?
You can get a two-pole GFCI breaker for this purpose. The prices are a bit silly.
two poles breakers, 2×120V@20A, $USD:

    - $20
    - GFCI, $115
    - GFCI + AFCI, $115
Yes it is expensive, but it can also save your life.
These are not redundant PSUs, each PSU powers different GPUs in the same machine. Are you sure connecting them to different phases is a good idea?

I've been looking for a proper answer to this for a while, because I want to build a similar machine with 8 GPUs (~4500W max load) which would need to be split between two 16A 230V circuits.

The transformers in the power supplies provide 'isolation' between the input and output - which means you can connect the outputs together, even when the inputs are on different phases.

Are you planning to build such a machine for your personal home use? If so you should be aware that (a) you might find server hardware hasn't thoroughly tested compatibility with things like suspend; (b) you might find games haven't thoroughly tested compatibility with multi-GPU setups; and (c) you might find the idle power consumption is 200W or more, even while doing nothing.

It's for personal use, though it would not run any games, it would be for running offline inference and other experiments. Probably not a smart purchase, but a fun one...

That is good to know multiple phases can work. Perhaps there would still be a fire risk in case of a short? Like somehow bridging the circuits > breakers don't trip?

Keep in mind GPUs (and the rest of the computer) run on DC, not AC, so there is no phase by the time it comes to your computer. The PSU will step down the AC to the right voltage and then rectify it into DC, and they do that independently so whatever phase they started with shouldn't matter.

Something to keep in mind though is that (at least with consumer-grade PSUs) it is not safe to simply tie the outputs together, even if both PSUs produce 5V, 12V, 3.3V, etc. The voltages will be slightly different and connecting them together will cause current to flow back into one of the PSUs.

You can still use this setup though, the key is that the GPUs do not (or should not) connect the motherboard voltage provided via the card slot to the voltage provided via the power connector. This detail allows you to safely power the motherboard from one PSU and power the GPU from another one, you just have to be careful not to mix connectors on the same card between different PSUs (if it has multiple). Additionally the motherboard should be entirely powered from a single PSU.

Because most households in the US might have maybe 3 breakers setup this way, all of which are likely running critical infrastructure already.

Most folks aren’t going to unplug their water heater to turn on their AI.

Swapping an electric dryer around is maybe more practical. It also gives you an obvious place to dump the waste heat.

If I was serious about this I'd have an electrician and HVAC installer on the way first. A mini split in the computer room with a dedicated 50A/220v circuit.

I assume the people dropping 15$k on one of these to have in their house are comfortable with paying an electrician to wire it in if necessary.
There are few or no 220 volt circuits in North America. Your choices in that range are 208V or 240V.

But yes, a power supply can draw around 240V times 20A = 4800VA, which is nearly 4800W if the power factor is close to 1. An office in an office building is more likely to have 208V.

IIRC USA is 110V, not 220V?
I have a lot of 220V circuits. One is like 80A and powers a whole building. Also, almost all power comes into a home as 220V single phase from the local power distribution.

Water heater, heat pumps, stove, dryer, hot tub, etc are all 220.

Most US homes have at least one 220v split phase line for major appliances like stoves or AC.
Yes, but most homes don't have extra 220v outlets except for the ones provided for the specific appliances that need them.

So if you want to plug in a device like this "tinybox" at home, it's going to be a lot easier to find two separate 110v outlets on different circuits than to have a new 220v circuit added, or to unplug your stove every time you want to use it.

I don't know what adversarial relationship you have with electricians, but adding more 220v outlets is absolutely feasible. Usually takes an electrician a day of work.
Who needs a stove? My 3200W GPU box puts out more than enough heat to roast a chicken.
Most homes have a 240V supply with a neutral wire (V1, V2, N). This allows for split phase 120V power (V1+N, V2+N). You can also get 240V (V1+V2).

It's common for EVs, clothes dryers, ovens, and hot water heaters to use 240V while most other appliances are 120V.

220V is American version of what is known as 380V/400V elsewhere.
US three-phase power is mostly 208V, 240V, and 480V. The 208V is what normal residential 120/240V split-phase was made from. 240V is high-leg delta three phase and I think was old alternative to split-phase. 480V is used for light industrial that needs more power.

There is nothing in US power system that is 220V.

Ackshually, you need to tell that the GP of the thread, they began using "220v".