Hacker News new | ask | show | jobs
by abdullin 894 days ago
Training, especially on large GPU clusters, is inherently non-deterministic. Even, if all seeds are fixed.

This boils down to framework implementations, timing issues and extra cost of trying to ensure determinism (without guarantees).