|
|
|
|
|
by SethTro
1212 days ago
|
|
(1022362 + 82432) gpu-hours / 2048gpus / 5 months ~= 15% uptime. That's only 0.08 nines of availability! I remember in one of their old guidebooks a lot of struggle to keep their 64 machine (512 gpu) cluster running this was probably 4x the machines and 4x the number of cluster dropouts. |
|