Hacker News new | ask | show | jobs
by AdieuToLogic 161 days ago
The author presents a false dichotomy when discussing "Why Not AI".

  ... there are some serious costs and reasonable 
  reservations to AI development. Let's start by listing 
  those concerns

  These are super-valid concerns. They're also concerns that 
  I suspect came around when we developed compilers and 
  people stopped writing assembly by hand, instead trusting 
  programs like gcc ...
Compilers are deterministic, making their generated assembly code verifiable (for those compilers which produce assembly code). "AI", such as "Claude Code (or Cursor)" referenced in the article, is nondeterministic in their output and therefore incomparable to a program compiler.

One might as well equate the predictability of a Fibonacci sequence[0] to that of a PRNG[1] since both involve numbers.

0 - https://en.wikipedia.org/wiki/Fibonacci_sequence

1 - https://en.wikipedia.org/wiki/Pseudorandom_number_generator

4 comments

If LLMs were like compilers, you could put src/ into .gitignore and only upload the prompt.

Even the earliest compilers didn't work by the programmer writing code, copying the assembly output into their source tree, and throwing away the code.

This is not a value judgement, they simply aren't the same thing at all.

here you go, a prompt only library: https://github.com/dbreunig/whenwords
That's great. Here's "me" implementing a JS version of that library in one shot using Github Copilot and a 1 sentence prompt:

> Implement when.js as a simple, zero-dependency js library following SPEC.md exactly.

https://github.com/jncraton/whenwords/pulls

>> Compilers are deterministic, making their generated assembly code verifiable

This is true (to an extent), but the generated LLM code is also verifiable. We use automated tests to do it.

automated tests are not verification. The "llm as a compiler" provides zero guarantees about the code.

A compiler offers absolute guarantees that what you write is semantically preserved, barring bugs in the compiler itself. An llm provides zero guarantees even with zero bugs in the llm's code.

>> A compiler offers absolute guarantees

I think one of the sibling comments addresses this myth rather neatly: https://news.ycombinator.com/item?id=46563383

tl;dr compilers are not fully deterministic either.

please point out where i said "deterministic".

I said guarantees that semantics are preserved.

I don't know what you are arguing, or why. Please follow the thread in its full context. Specifically, the argument the article author is making is that moving to a higher level of abstraction also cost developers the benefit of understanding the internals. Ultimately, that ended up not mattering very much.

The OP pushed back on this, saying compilers are deterministic and LLMs are not, and that lack of determinism makes LLM output unverifiable. I said the latter is not true because you can perform verification using tests. You claimed tests are not verification because LLMs don't preserve the semantics.

I'm not sure why semantics matter. LLMs providing no guarantees regarding the preservation of semantics is not important because you can guarantee the behavior of the generated code using tests. In most domains, this is sufficient. You tell the LLM to write code that does X, Y and Z, and then verify X, Y and Z using a test. That's it.

no, writing tests to verify that the "compiled" code semantically matches the code in the source language is not a good thing. The guarantees that I'm talking about are different.

You write tests for your own logic, not to do the compiler's job.

I have no idea why you are so stuck on determinism. That has nothing to do with what i'm saying. Sure compilers can be nondeterministic with things such as register allocation, but that is totally transparent to the programmer. The compiled code will do exactly what the source code describes. The nondeterminism in llms does not apply just to those things. An llm's nondeterminism might mean it decides to encode different logic, instead of a different implementation that is logically equivalent.

We don't usually write steps to verify that the compiler decided to ignore our code and do its own thing. You have to do that with llms

I suspect the argument is that both AI and a compiler enables building software at a higher level of abstraction.
Abstraction is only useful when it involves a consistent mapping between A and B, LLM’s don’t provide that.

In most contexts you can abstract the earth as a sphere and it works fine ex:aligning solar panels etc. Until you enter the realm of precision where treating the earth as a sphere utterly fails. There’s no realistic set of tests you can right where an unsupervised LLM’s output can be trusted to generate a complex system which works if it’s constantly being recreated. Actual compilers don’t have that issue.

> Compilers are deterministic, making their generated assembly code verifiable

People keep saying this like it is an absolute fact, whereas in reality it is a scale.

Compilers are more deterministic than LLMs in general, but no they are not completely deterministic. That's why making reproducible builds is hard!

https://stackoverflow.com/questions/52974259/what-are-some-e... and https://github.com/mgrang/non-determinism give some good examples of this.

This leads to the point: in general do we care about this non-determinism?

Most of the time, no we don't.

Once you accept that the next stage is accepting that most of the time the non-deterministic output of an LLM is good enough!

This leads to how do I verify it is good enough which leads to testing and then suddenly you have a working agentic loop....

>> Compilers are deterministic, making their generated assembly code verifiable

> People keep saying this like it is an absolute fact, whereas in reality it is a scale.

My statement is of course a generalization due to its terseness and focuses on the expectation of repeatable results given constant input, excluding pathological definitions of nondeterminism such as compiler-defined macro values or implementation defects. Modern compilers are complex systems and not really my point.

> This leads to the point: in general do we care about this non-determinism?

> Most of the time, no we don't.

Not generally the type of nondeterminism I described, no. Nor the nondeterministic value of the `__DATE__` macro referenced in the StackOverflow link you provided.

> Once you accept that the next stage is accepting that most of the time the non-deterministic output of an LLM is good enough!

This is where the wheels fall off.

First, "most of the time" only makes sense when there is another disjoint group of "other times." Second, the preferred group defined is "non-deterministic [sic] output of an LLM is good enough", which means the "other times" are when LLM use is not good enough. Third, and finally, when use of an approach (or a tool) is unpredictable (again, excluding pathological cases) given the same input, it requires an open set of tests to verify correctness over time.

That last point may not be obvious, so I will extrapolate as to why it holds.

Assuming the LLM in use has, or is reasonably expected to have, model evolution, documents generated by same will diverge unpredictably given a constant prompt. This implies prompt evolution will also be required at a frequency almost certainly different than unpredictable document generation intrinsic to LLMs. This in turn implies test expectations and/or definitions having to evolve over time with nothing changing other than undetectable model evolution. Which means any testing which exists at one point in time cannot be relied upon to provide the same verifications at a later point in time. Thus the requirement of an open set of tests to verify correctness over time.

Finally, to answer your question of:

  how do I verify it is good enough
You can't, because what you describe is a multi-story brick house built on a sand dune.
> Assuming the LLM in use has, or is reasonably expected to have, model evolution, documents generated by same will diverge unpredictably given a constant prompt.

So what?

You tell it once. It writes code.

You test that code, not the prompt.

> This leads to the point: in general do we care about this non-determinism?

> Most of the time, no we don't.

well that’s a sweeping generalisation. i think this is a better generalised answer to your question.

> It depends on the problem we’re trying solve and the surrounding conditions and constraints.

software engineering is primarily about understanding the problem space.

are 99% of us building a pacemaker? no. but that doesn’t mean we can automatically make the leap to assuming a set of tools known for being non-deterministic are good enough for our use case.

it depends.

> Once you accept that the next stage is accepting that most of the time the non-deterministic output of an LLM is good enough!

the next stage is working with whatever tool(s) is/are best suited to solve the problem.

and that depends on the problem you are solving.

> are 99% of us building a pacemaker? no. but that doesn’t mean we can automatically make the leap to assuming a set of tools known for being non-deterministic are good enough for our use case.

This seems irrelevant?

Either way hopefully you test the pacemaker code comprehensively!

That's pretty much the best case for llm generated code: comprehensive tests of the desired behaviour.