Hacker News new | ask | show | jobs
by CuriouslyC 3 days ago
This doesn't match my experience, in academia I saw ~40-45% utilization NVIDIA GPU clusters that went 6 years with <20% failure rate. Might be a TPU thing?
1 comments

I'm FAR form an expert on this, but I believe that the operating costs such as power + cooling form a big part of the lifecycle. I have no doubt that at some point within the 6 years that are being booked, that replacing entire working racks won't be more cost efficient.
That is current practice, yes. The economics of replacing racks then selling the old ones to people who will salvage and resell working components works out better than trying to repair/retrofit in place.