| HN Mirror

But isn't the link shared by that comment doing exactly that

https://sulbhajain.medium.com/why-llms-arent-truly-determini...

>The Thinking Machines research team showed it’s possible to fix this. They built batch-invariant kernels for RMSNorm, matrix multiplication, and attention, integrating them into the open-source inference engine vLLM.

>The outcome: 1,000 identical prompts, 1,000 identical outputs. Perfect reproducibility.