| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by abdullin 894 days ago
	Training, especially on large GPU clusters, is inherently non-deterministic. Even, if all seeds are fixed. This boils down to framework implementations, timing issues and extra cost of trying to ensure determinism (without guarantees).