Hacker News new | ask | show | jobs
by perching_aix 350 days ago
And that exact prompt and that exact inference engine [version].

Pretty reasonable if you ask me. All of this was to say, these are still programs, all the regular sensibilities still apply. Heck, even for that, the sensibilities that apply are pretty old-school: in modern, threaded applications, you'd expect runtime behavioral variations. Not the case here. Even for the high-level language compilers referred to in the article, this doesn't apply so easily. The folks over at reproducible builds [0] put in a decent bit of effort to my knowledge to make it happen.

The overarching point being that it's not magic: it's technology. And if you hold them to even broadly similar standards you hold compilers to, they are absolutely deterministic.

In case you mean that if you pick anything else other than what I picked here, the process ceases to be deterministic, that is not true. You can trivially test and confirm that the same way I did.

[0] https://reproducible-builds.org/

2 comments

This is really brittle, though, and in a way that matters a lot more than reproducible builds. Like, sure: a slightly different compiler might cause a slightly different bug to be introduced into the code, but here a slight bit of non-determinism leads to a drastically and fundamentally different result! And not only is the exact inference engine version important, but sometimes the exact hardware as well (when run on GPU)...
My pet peeve here is mistaking nondeterminism with model unreliability. The issue is not that these models or inference engines are inherently nondeterministic, they aren't, but that the former are just plain unreliable.

I can ask the example question multiple times, and often it will tell me absolute balooney. This is not the model being nondeterministic, it's the model being not very good. It's inconsistent across what we consider semantically equivalent prompts. This is a gap in alignment, not some sort of computational property, like nondeterminism. If you ask it 4+4 in multiple different ways, it will always(?) reply correctly, but ask it something more complex, and you can get seriously different results.

The same exchange can be played out multiple times again and again, and it will reply with the exact same misunderstandings in the exact same order again and again, byte to byte matching. The kicker here is batched inference being a necessity for economical reasons at scale, and exactly matching outputs being impractical because of their non-formal target usage. So you get these wishy washy back and forths about whether the models are deterministic or not, which is super frustrating and misses the point.

And I have to keep dodging bullets like "well if you swap the hardware from under it it might change" or "if you prompt a remote service where they control what's deployed and don't tell you fully, and batch your prompt together with a bunch of other people's which you have no control over, it will not be reproducible". Yes, it might not be, obviously. Even completely deterministic code can become nondeterministic if you run it on top of an execution environment that makes it so. That's not exactly a reasonable counter I don't think.

With regular software, there's a contract (the specifications) that code is developed against, and variances are constrained within there. This is not the case with LLMs, the variances are letting jesus take over the wheel tier, and that's a perfectly fine retort. But then it's not nondeterminism that's the issue.

> Pretty reasonable if you ask me.

Where can i download ChatGPT models?

> The folks over at reproducible builds

I like those folks (hello hboeck!), but their work are unrelated to determinstic LLM output, so why even bring them up here?

> The overarching point being that it's not magic: it's technology.

Yea are you even responding to me, or is this just a stream of thought?

Noone said LLM is magic.

> Where can i download ChatGPT models?

Nowhere. Does that make LLMs nondeterministic?

> I like those folks (hello hboeck!), but their work are unrelated to determinstic LLM output, so why even bring them up here?

Did you read the blogpost in the OP?

I cannot understand what I wrote for you on your behalf. What could possibly be unclear about this? Or was this another rhetorical like the previous one, to polish your snark?

> Yea are you even responding to me, or is this just a stream of thought?

Yes, I was. Are you able to ask non-rhetorical questions too? Should I have asked for your blessed permission to write more than just a direct address of what you asked about?

>> Where can i download ChatGPT models?

> Nowhere. Does that make LLMs nondeterministic?

It does, yes.

I can not reproduce the same output without access to the model.

Hence, not deterministic.

I see, that's a very entertaining way of reasoning about the determinism of an entire class of software, thanks for sharing.