Hacker News new | ask | show | jobs
by orbital-decay 16 days ago
>I wonder why people worry so much more about "determinism" over conformance to a spec.

From what I've seen in these HN discussions, most people are using "determinism" when they really mean "prompt sensitivity", i.e. minor variations in framing leading to different results. This, in turn, confuses people who do understand what determinism is supposed to mean and where it's necessary (build reproducibility for example).

>nonconformance to a spec

This is bound to happen for informal specs. It's the inherent property of the domain both models and humans operate in.

Current models are optimized for coding rather than human communication, and have shallow understanding of human intent. Their reading between the lines is pretty poor, and they can't distinguish between important and unimportant details of the prompt, following it too literally and forcing you to give them more context for clarification.

1 comments

> From what I've seen in these HN discussions, most people are using "determinism" when they really mean "prompt sensitivity", i.e. minor variations in framing leading to different results. This, in turn, confuses people who do understand what determinism is supposed to mean and where it's necessary (build reproducibility for example).

For lack of a better word, I'd also have used "determinism". But to borrow a bit from TFA, what I'd really mean by that would some kind of "semantic determinism": for any input source code in a well-defined language, a correctly working compiler will always produce output that's semantically correct for the input.

Let's say a compiler implementation internally does something random or nondeterministic but that the nondeterminism does not affect the semantics of the output. You could argue that the compiler is technically nondeterministic, but in terms of program semantics it would still be deterministic.

I assume that's what people mean when they say compilers are deterministic in comparison to LLMs.

So in some sense the post is correct, but I think the author is somewhat pedantically misinterpreting the way people use the word "nondeterminism".

IMO prompt sensitivity is something different. A prompt does not unambiguously describe full program semantics in the first place, and the neural network would not contain an explicit mechanism for producing semantically matching output even if it did. Prompt sensitivity comes on top of that but isn't the core matter.

I think you want the impossible, because a) input semantics are non-formal and ambiguous/subjective by definition, and b) the model suffers from the curse of its knowledge being vastly wider than yours and doesn't have enough context to converge on exactly what you want in the huge space of possibilities presented by even the most constraining but still informal inputs.

If you limit your requirement to the difference between your and model's interpretations being small enough, that's probably doable. Which is realistically what most people want, and most good coding models already have, more or less (with caveats that still need to be addressed, of course). But a hard guarantee of output staying unchanged with different inputs is not possible to give (regardless of whether you think they're unambiguous) due to the nature of intelligence, human or machine.

I'm not asking for LLM tools to be similar to compilers, or saying that they can't be useful if they aren't. I know rather well that the two are different, and that's the point.

Because LLMs aren't deterministic in terms of producing semantically correct output, that just means they aren't similar to compilers. That means you probably can't just start blindly trusting that their output matches the input and thus ignore understanding the code, as most people mostly can with compilers.

I think that's what people mean with "determinism" when they compare LLMs to compilers, or in response to other people suggesting LLMs are no different than compilers.