| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nnevatie 51 days ago
	> Namely setting temperature to zero, and turning off all history That's not nearly enough, though. The multi-node/GPU inference and specifically batching (and ordering in batching) have non-deterministic consequences for the current LLM services.

1 comments

True but for small models it's pretty close. See my comment below about other cases leading to nondeterminism.