Hacker News new | ask | show | jobs
by angry-tempest 1384 days ago
Probably not for a few years, you need a (maybe few) A100(s) to be able to backprop a model that big with float32.
1 comments

iirc, they tweeted about using around 3800 in parallel