|
|
|
|
|
by qsera
109 days ago
|
|
But isn't the link shared by that comment doing exactly that https://sulbhajain.medium.com/why-llms-arent-truly-determini... >The Thinking Machines research team showed it’s possible to fix this. They built batch-invariant kernels for RMSNorm, matrix multiplication, and attention, integrating them into the open-source inference engine vLLM. >The outcome: 1,000 identical prompts, 1,000 identical outputs. Perfect reproducibility. ?? |
|