Hacker News new | ask | show | jobs
by u1hcw9nx 106 days ago
They are build to physically last 5-7 years in 24/7 datacenter use, but they have effective lifetime just 3-4 years, then their value has deprecated and electricity and infrastructure cost dominates. Meta did a benchmark where 9% of the chips failed every year, 'infant mortality' is much higher in the first 3 months of use.
1 comments

9% is an absurd failure rate for solid state electronics. Particularly considering the profit margins. I assume it's related to the power densities involved. Would you happen to recall the source?
It's pretty bad: https://www.datacenterdynamics.com/en/news/meta-report-detai...

Jensen said they added a lot of RAS in Blackwell which kind of admits Hopper wasn't reliable enough.