|
|
|
|
|
by throwawaymaths
555 days ago
|
|
So I interacted with people at cerebras at a tradeshow and it seems like you have to have extremely advanced cooling to keep that thing working. IIRC the user agreement says "you can't turn it off or else the warranty is void". With the way their chip is designed, I would be strongly worried that the giant chip has warping issues, for example, when certain cores are dark and the thermal generation is uneven (or, if it gets shut down on accident while in the middle of inferencing an LLM). There may even be chip-to-chip variation depending on which cores got dq'd based on their on-the-spot testing. Already through the gapevine I'm hearing that H100s and B100s have to be replaced more often.... than you'd want? I suspect people are mum about it otherwise they might lose sweetheart discounts from nvidia. I can't imagine that cerebras, even with their extreme engineering of their cooling system, have truly solved cooling in a way that isn't a pain in the ass (otherwise they wouldn't have the clause?) and if I were building a datacenter I would be very worried about having to do annoying and capital intensive replacements. |
|