Hacker News new | ask | show | jobs
by Zenst 2361 days ago
A chip that size, imagine the yield. Equally, cooling - has to be water based as a heatsink that size would be on par to a small anvil and the weight factor would be some serious issues. Though unsure as no pictures of it in-play alas and all they say is - "20 kilowatts being consumed by each blew out into the Silicon Valley streets through a hole cut into the wall", which does somewhat beg for a picture as just raises more questions.

Why would they make a chip this big with AMD showing a chiplet design approach is cheaper and more scalable on so many levels. Let alone, yields.

Equally, arms approach to utilising the back of the chip as a power delivery :- https://spectrum.ieee.org/nanoclast/semiconductors/design/ar...

Then a wafer scale chip like this, using that approach, would save so much power. But again, yeilds will be a factor and can imagine this is not the cutting edge process node as you find as nodes mature, the yields improve. So an older node size would have a better yield and be more suitable for such wafer scale chips. But again, no mention of what is used. I have read in the past that it would use Intel's 10nm, but this article mentions TSMC. Another article I read that they used a 16nm node ( https://fuse.wikichip.org/news/3010/a-look-at-cerebras-wafer... ), which as mentioned above about node maturity, understandable.

8 comments

I've seen a demo of the machine. It's about 17u in size, with the vast majority (like 15u) of that being for cooling. This was over two years ago so things may have changed.

Right now I'm hosting some DGX's, and only one datacenter in the bay area had the ability to power a full rack of them. Power density is going to be a real issue for the these systems.

Wow, that really does add some perspective upon the cooling and the aspect about power requirements datacenter wise really does highlight how out-there these type of systems are over the usual rack layouts.

Equally, the cooling capacity of the datacenter comes into play with such systems. Given the power density, the amount of heat being generated would equally be above your normal rack output.

Yeah- kind of tangental but it also plays along with how datacenters are transitioning from selling space to selling power. It used to be I'd just rent space by the rack or by the U, and then maybe pay extra for the network connection. Now the space itself is pretty cheap, and the network hookups are unbelievably cheap, but datacenters are actually paying attention to power consumption.

In the case of the DGX-1 I've had datacenters tell me I couldn't put more than two in a rack. We ended up finding a datacenter the specialized in them (Colovore, who I can not recommend highly enough)- their power and cooling systems are some of the most impressive I've ever seen.

In most cases the cooling capacity is in fact the actual limit you are running up against. Getting more power into a rack is a simple matter of running more cable. Getting more power _out_ of the rack is a much more complicated issue to resolve.
I think it's a little more complicated than running more cables. Most datacenters have a total capacity they can handle, based on how many connections they have to their local grid (or grids, as datacenter places like Santa Clara have multiple power grids to give datacenter redundancy). You need to make sure your internal power distribution systems can actually handle the amount you want to push through, and you need to ensure that your backup power is actually enough to get you through major outages.

AWS, as an example, tends to only have 20MW to 30MW for each of their datacenters- anything above that they say isn't worth the hassle when they can just open a new datacenter. Power is definitely a limiting factor.

Getting more power into a datacenter is a different problem than getting more (already available) power into a rack. I suppose I could have added "if your existing power distribution system can handle the extra power capacity". That includes service entrance, transfer switching, standby and backup power sources, and distribution to the rack level.

The point I'm trying to make is that, all things being equal, it's _much_ easier to handle un-equal power load between individual racks than it is to deal with the cooling side of the equation. Adding more power to a single rack usually just means a few more whips from your distribution. Getting that one extra-hot rack in the aisle to be effectively cooled requires a lot more infrastructure than running some cables.

I'm waiting for the high pressure helium filled datacenter.
Yes the whole getting more power into a datacenter is much easier to add than the extra cooling capacity to remove that power once it has transitioned into heat. But I'd imagine they would plan and monitor that aspect and may even have redundant cooling systems. But certainly a potential gotcha and one that would soon sort out the bad datacenters when they end up seeing all there hosting overheat and offline.
I think this is also a paradigm problem. Modern chipset advancements are at the crossroads of power vs. cooling. The logical extension of that fight is greater power and cooling requirements in the DC which it is not necessarily equipped to provide by default.
This is why I thought what Colovore did was pretty smart- they built liquid cooling into all of their racks. They are literally the densest datacenter I've found that actually allows people to colo with them (I'm sure there's plenty of companies who own their own datacenters that might be denser), but even with their systems you'd only be able to fit two of the Cerebras systems in a single rack (and you wouldn't be able to power both up 100% at the same time).

https://www.colovore.com/data-center/

> Why would they make a chip this big with AMD showing a chiplet design approach is cheaper and more scalable on so many levels. Let alone, yields.

They're taking a radically different approach, and hoping that they'll be able to route around defects, unlike AMD where a defect in the uncore kills the whole chiplet.

A lot of the people involved in this actually come from AMD, so I imagine they're familiar with the issues AMD ran into.
Not 100% of the chip is enabled, they disable defective parts and don't advertise a model that has 100% parts enabled, so they don't need magical zero defect wafers.

Images of the whole computer were published, you can see the massive cooling system: https://www.tomshardware.com/news/worlds-largest-chip-gets-a...

Did they come up with an architecture which can route around any defect? Probably not. Now, granted, 90% of their chip is probably dedicated to compute, but I'd bet there's some management infrastructure where they absolutely cannot tolerate a defect.
They'll simply have redundant copies of that logic. And they'll be physically located at areas of the wafer that yield well - some areas are much worse than others and I would imagine they'll make use of that.
Interesting so on a die, there are area's which are more prone to faults and they are able to factor that into the design?

Though if there are known hotspots, wouldn't that point to the process node inducing them over silicon quality? Or is it a case of silicon production produces known hotspots that are predictable? FWIW, I'm currently learning towards process node over the silicon being the source of hotspots, given what I know about silicon production.

With normal-sized dies, at the die-level, I've not seen people design around this; other than the more obvious places e.g. the corners (bad power delivery, prone to mechanical issues, normally left vacant) and the middle (gets hotter, also sometimes bad power). However, there are many test structures placed across the die to measure/check variations and also design rules that constrain the relative placements of certain things. That also goes towards increasing yield.

But at the wafer-level, yes.

> wouldn't that point to the process node inducing them over silicon quality?

I don't see why. I would only vaguely guess it's related to the manufacturing process they follow at that particular node. Maybe it's not even directly silicon related but something else.

I'm not convinced it's worthwhile separating out the process node and the silicon quality, they are entwined when looking across a large sample size.

Unfortunately, someone that actually knows why probably isn't allowed to share why.

>which does somewhat beg for a picture as just raises more questions.

There's a picture in the article.

>Why would they make a chip this big

Did you read the article?

>this article mentions TSMC. Another article I read that they used a 16nm node

Yes, 16nm/TSMC.

> There's a picture in the article.

Yes - hardly helpful ones as you get a picture of a wafer and a box, not breakdown beyond that - hence had look and found other articles with much more detail upon this that answers the questions I raised in relation to the lack of pictures - like the cooling aspect in which you snipped my quote and removed that lovely thing we call context.

>Did you read the article?

Yes and had you read what I said you would see that the article does not answer the aspects I was asking - see what you did there.

>Yes, 16nm/TSMC

Yes - I found that in another article - which I also linked, you're welcome.

I'm really curious about the benefits of their implementation. It's far beyond my grasp to make any serious criticisms and I don't really want to doubt them, it just seems a pretty radical departure from even the direction of innovation.

The way they paint it sounds like they're putting in redundant cores to account for failure of what seems like what I would call the 'first line' cores, i.e. there's cores that are only used if some primary ones aren't working?

But sort of intuitively that doesn't make a whole lot of sense given the parallel nature. Maybe they are just putting in 101% of specified cores, and if there's a ~1% hopefully uniform-ish core failure rate then it's all gucci?

I guess my question is probably similar to yours, what are you giving up with yield-enhancing redundancy of a behemoth die vs integrating a bunch of confirmed working chiplets together?

The CEO says 1-1.5%.

"Cerebras approached the problem using redundancy by adding extra cores throughout the chip that would be used as backup in the event that an error appeared in that core’s neighborhood on the wafer. “You have to hold only 1%, 1.5% of these guys aside,” Feldman explained to me. Leaving extra cores allows the chip to essentially self-heal, routing around the lithography error and making a whole-wafer silicon chip viable."

https://techcrunch.com/2019/08/19/the-five-technical-challen...

The article claims that keeping everything on one die raises interconnect bandwidths and lowers latencies over what would be possible in a conventional supercomputing setup. Connections are made over the silicon that is normally left aside for cutting the chips apart. Apparently that is a special process that they had to collaborate with a partner in order to get working.
Chiplet designs means that you still have to route signals either onto an interposer or onto a PCB. If you have a silicon interposer you have the same issue of making a really large silicon die. If you route into the PCB, then you may need SerDes depending on what you do and bandwidth will be lower and latency will be higher due to signal integrity issues.

Maybe something like Intel's EMIB technology where they have small interposers along edges of chips rather than having a giant interposer might help here.

Yields are probably fairly good if they design for manufacturing by placing extra cores / wires to route around failures as I am sure they are.

The future of these interconnects is to make them optical. Once the interconnects are optical lots of problems get solved. Chips don’t have to be in same enclosure, simplifying cooking etc.
I will dissent. Organic interposers are dirt cheap, and nearly as good unless all you want is density.
> A chip that size, imagine the yield

From discussion at a demo the yield is good, since they are using a large node. Their hardware rerouting also mitigates defects on most chips.

Many single-chip processors contain redundancy or ability to route around bad units. Yield isn’t an issue if it has programmable datapaths, even at this scale.