Hacker News new | ask | show | jobs
by tedivm 2361 days ago
I've seen a demo of the machine. It's about 17u in size, with the vast majority (like 15u) of that being for cooling. This was over two years ago so things may have changed.

Right now I'm hosting some DGX's, and only one datacenter in the bay area had the ability to power a full rack of them. Power density is going to be a real issue for the these systems.

1 comments

Wow, that really does add some perspective upon the cooling and the aspect about power requirements datacenter wise really does highlight how out-there these type of systems are over the usual rack layouts.

Equally, the cooling capacity of the datacenter comes into play with such systems. Given the power density, the amount of heat being generated would equally be above your normal rack output.

Yeah- kind of tangental but it also plays along with how datacenters are transitioning from selling space to selling power. It used to be I'd just rent space by the rack or by the U, and then maybe pay extra for the network connection. Now the space itself is pretty cheap, and the network hookups are unbelievably cheap, but datacenters are actually paying attention to power consumption.

In the case of the DGX-1 I've had datacenters tell me I couldn't put more than two in a rack. We ended up finding a datacenter the specialized in them (Colovore, who I can not recommend highly enough)- their power and cooling systems are some of the most impressive I've ever seen.

In most cases the cooling capacity is in fact the actual limit you are running up against. Getting more power into a rack is a simple matter of running more cable. Getting more power _out_ of the rack is a much more complicated issue to resolve.
I think it's a little more complicated than running more cables. Most datacenters have a total capacity they can handle, based on how many connections they have to their local grid (or grids, as datacenter places like Santa Clara have multiple power grids to give datacenter redundancy). You need to make sure your internal power distribution systems can actually handle the amount you want to push through, and you need to ensure that your backup power is actually enough to get you through major outages.

AWS, as an example, tends to only have 20MW to 30MW for each of their datacenters- anything above that they say isn't worth the hassle when they can just open a new datacenter. Power is definitely a limiting factor.

Getting more power into a datacenter is a different problem than getting more (already available) power into a rack. I suppose I could have added "if your existing power distribution system can handle the extra power capacity". That includes service entrance, transfer switching, standby and backup power sources, and distribution to the rack level.

The point I'm trying to make is that, all things being equal, it's _much_ easier to handle un-equal power load between individual racks than it is to deal with the cooling side of the equation. Adding more power to a single rack usually just means a few more whips from your distribution. Getting that one extra-hot rack in the aisle to be effectively cooled requires a lot more infrastructure than running some cables.

I'm waiting for the high pressure helium filled datacenter.
Yes the whole getting more power into a datacenter is much easier to add than the extra cooling capacity to remove that power once it has transitioned into heat. But I'd imagine they would plan and monitor that aspect and may even have redundant cooling systems. But certainly a potential gotcha and one that would soon sort out the bad datacenters when they end up seeing all there hosting overheat and offline.
I think this is also a paradigm problem. Modern chipset advancements are at the crossroads of power vs. cooling. The logical extension of that fight is greater power and cooling requirements in the DC which it is not necessarily equipped to provide by default.
This is why I thought what Colovore did was pretty smart- they built liquid cooling into all of their racks. They are literally the densest datacenter I've found that actually allows people to colo with them (I'm sure there's plenty of companies who own their own datacenters that might be denser), but even with their systems you'd only be able to fit two of the Cerebras systems in a single rack (and you wouldn't be able to power both up 100% at the same time).

https://www.colovore.com/data-center/