Hacker News new | ask | show | jobs
Resiliency at Scale: Managing Google's TPUv4 Machine Learning Supercomputer (micahlerner.com)
1 points by mlerner 506 days ago