| HN Mirror

My pet peeve here is mistaking nondeterminism with model unreliability. The issue is not that these models or inference engines are inherently nondeterministic, they aren't, but that the former are just plain unreliable.

I can ask the example question multiple times, and often it will tell me absolute balooney. This is not the model being nondeterministic, it's the model being not very good. It's inconsistent across what we consider semantically equivalent prompts. This is a gap in alignment, not some sort of computational property, like nondeterminism. If you ask it 4+4 in multiple different ways, it will always(?) reply correctly, but ask it something more complex, and you can get seriously different results.

The same exchange can be played out multiple times again and again, and it will reply with the exact same misunderstandings in the exact same order again and again, byte to byte matching. The kicker here is batched inference being a necessity for economical reasons at scale, and exactly matching outputs being impractical because of their non-formal target usage. So you get these wishy washy back and forths about whether the models are deterministic or not, which is super frustrating and misses the point.

And I have to keep dodging bullets like "well if you swap the hardware from under it it might change" or "if you prompt a remote service where they control what's deployed and don't tell you fully, and batch your prompt together with a bunch of other people's which you have no control over, it will not be reproducible". Yes, it might not be, obviously. Even completely deterministic code can become nondeterministic if you run it on top of an execution environment that makes it so. That's not exactly a reasonable counter I don't think.

With regular software, there's a contract (the specifications) that code is developed against, and variances are constrained within there. This is not the case with LLMs, the variances are letting jesus take over the wheel tier, and that's a perfectly fine retort. But then it's not nondeterminism that's the issue.