|
|
|
|
|
by solveit
619 days ago
|
|
Practicality of reproducing training runs that cost tens of millions aside, it's hopeless. Determinism is hard enough with a single GPU, fixing a seed isn't going to be much help when training is distributed across hundreds of GPUs. |
|