Hacker News new | ask | show | jobs
by sumeno 12 days ago
They don't immediately become worthless, but they don't last all that long either

https://www.tomshardware.com/pc-components/gpus/datacenter-g...

1 comments

This doesn't match my experience, in academia I saw ~40-45% utilization NVIDIA GPU clusters that went 6 years with <20% failure rate. Might be a TPU thing?
I'm FAR form an expert on this, but I believe that the operating costs such as power + cooling form a big part of the lifecycle. I have no doubt that at some point within the 6 years that are being booked, that replacing entire working racks won't be more cost efficient.
That is current practice, yes. The economics of replacing racks then selling the old ones to people who will salvage and resell working components works out better than trying to repair/retrofit in place.