|
|
|
|
|
by saurik
356 days ago
|
|
This is really brittle, though, and in a way that matters a lot more than reproducible builds. Like, sure: a slightly different compiler might cause a slightly different bug to be introduced into the code, but here a slight bit of non-determinism leads to a drastically and fundamentally different result! And not only is the exact inference engine version important, but sometimes the exact hardware as well (when run on GPU)... |
|
I can ask the example question multiple times, and often it will tell me absolute balooney. This is not the model being nondeterministic, it's the model being not very good. It's inconsistent across what we consider semantically equivalent prompts. This is a gap in alignment, not some sort of computational property, like nondeterminism. If you ask it 4+4 in multiple different ways, it will always(?) reply correctly, but ask it something more complex, and you can get seriously different results.
The same exchange can be played out multiple times again and again, and it will reply with the exact same misunderstandings in the exact same order again and again, byte to byte matching. The kicker here is batched inference being a necessity for economical reasons at scale, and exactly matching outputs being impractical because of their non-formal target usage. So you get these wishy washy back and forths about whether the models are deterministic or not, which is super frustrating and misses the point.
And I have to keep dodging bullets like "well if you swap the hardware from under it it might change" or "if you prompt a remote service where they control what's deployed and don't tell you fully, and batch your prompt together with a bunch of other people's which you have no control over, it will not be reproducible". Yes, it might not be, obviously. Even completely deterministic code can become nondeterministic if you run it on top of an execution environment that makes it so. That's not exactly a reasonable counter I don't think.
With regular software, there's a contract (the specifications) that code is developed against, and variances are constrained within there. This is not the case with LLMs, the variances are letting jesus take over the wheel tier, and that's a perfectly fine retort. But then it's not nondeterminism that's the issue.