Hacker News new | ask | show | jobs
by solveit 619 days ago
Practicality of reproducing training runs that cost tens of millions aside, it's hopeless. Determinism is hard enough with a single GPU, fixing a seed isn't going to be much help when training is distributed across hundreds of GPUs.