Hacker News new | ask | show | jobs
by nnevatie 51 days ago
> Namely setting temperature to zero, and turning off all history

That's not nearly enough, though. The multi-node/GPU inference and specifically batching (and ordering in batching) have non-deterministic consequences for the current LLM services.

1 comments

True but for small models it's pretty close. See my comment below about other cases leading to nondeterminism.