|
|
|
|
|
by nnevatie
51 days ago
|
|
> Namely setting temperature to zero, and turning off all history That's not nearly enough, though. The multi-node/GPU inference and specifically batching (and ordering in batching) have non-deterministic consequences for the current LLM services. |
|