|
|
|
|
|
by manquer
67 days ago
|
|
Azure ran K-80/P-100 fleets a bit longer for 8-9 years . Google does 9 years for TPUs . In the current generation There are plenty of questions around - viability of training to inference cascades (the key to extended life) given custom ASICs hitting production like cerebras did early this year. - energy efficiency of older chips in tight energy environments , just new grid capacity constraints favor running newer efficient chips ignoring perhaps short term(< 1 year) price shock due to war. - higher MBTF , compared to older GPUs modern nodes are 8 GPU clusters built on 2/3 nm processors depending on HBM memory, the tolerances are much lower especially for training. - new DCs being spun up are being by up less than ideal conditions due to permitting, part supply and other constraints which will impact operating environment. Not withstanding, all these issues and even taking a generous 10 year useful life . The expenses dwarf every mega project before it . |
|