|
|
|
|
|
by abdullin
894 days ago
|
|
Training, especially on large GPU clusters, is inherently non-deterministic. Even, if all seeds are fixed. This boils down to framework implementations, timing issues and extra cost of trying to ensure determinism (without guarantees). |
|