|
|
|
|
|
by version_five
1523 days ago
|
|
This reflects my experience as well. Some frameworks like pytorch have a reproducibility function that can execute everything deterministically, at the expense of performance. I've done lots of ensembling work where we train multiple copies of the model, and generally we would start with different seed each time. If we start with the same seed but don't force the training to be deterministic, the results are typically different on each training run, though I have not actually explored if they are "less different" than if you start with different random seeds for initializing everything. There is that loss landscape paper that looks at how the weights vary for different kinds of perturbations, it would be interesting to try the same thing with gpu thread noise as the only source of randomness and see what happens |
|