Hacker News new | ask | show | jobs
by groodt 1165 days ago
Thanks for publishing this. I quickly skimmed the paper, I saw the impressive linear scaling as you scaled to 16 nodes. How long did it take to train the various models in wall clock time?