Hacker News new | ask | show | jobs
by abainbridge 86 days ago
> A spec is an envelope that contains all programs that comply. Creating this spec is often going to be harder than writing a single compliant program.

This perfectly explains the feeling I had when, 20 years into my career, I had to start writing specs. I could never quite put my finger on why it was harder than coding. My greater familiarity with coding didn't seem a sufficient explanation.

When writing a line of spec, I had to consider how it might be interpreted in the context of different implementations of other lines of spec - combinatorial nightmare.

But code is just a spec as far as, say, a C compiler is concerned. The compiler is free to implement the assembly however it likes. Writing that spec is definitely easier than writing the assembly (Fred Brookes said this, so it must be true).

So why the difference?

2 comments

C has a simpler mapping to assembly than most languages, so you are doing most of the high level translation when writing C. But even C compilers have considerable scope for weirdness, hence projects like CompCert.

But much of the code we run today is JIT executed, and that leaves ample room for exploiting with weird machines. Eg the TOCTOU in the Corina exploit.

Even at this very low level, full coverage specs require years of careful formal methods work. We have no hope of doing it at for vibe coding, everything will be iterative, and if TDD helps then good, but specs are by no means easier than code.

> But code is just a spec as far as, say, a C compiler is concerned. The compiler is free to implement the assembly however it likes.

Not at all. Code is formal, and going from C to assembly os deterministic. While the rules may be complex, they are still rules, and the compiler can’t stray away from them.

Writing C code is easier than writing assembly because it’s an easier notation for thinking. It’s like when hammering a nail. Doing it with a rock is hard. But using a hammer is better. You’re still hitting the nail with a hard surface, but the handle, which is more suitable for a human hand, makes the process easier.

So programming languages are not about the machine performance, they are about easier reasoning. And good programmers can get you to a level above that with proper abstraction and architecture, and give you concepts that directly map to the problem space (Gui frameworks instead of directly rendering to a framebuffer, OS syscall instead of messing with hardware,…).

> Code is formal, and going from C to assembly is deterministic.

OK, this is the main thing. Going from C to assembly is not deterministic in a sense because different compilers can produce different output. But the behaviour of the generated assembly is always the same. This isn't true of a spec.

> Going from C to assembly is not deterministic in a sense because different compilers can produce different output.

That’s a weird definition of determinism.

Why is this a weird definition of determinism? Could you please define what you mean when you say deterministic?

A C program does not identify a single assembly program. It identifies a set of assembly programs. This fits the pretty standard definition of non-determinism.

A difference between natural language and C code is that natural language does not have a formal semantics. Having no formal semantics is a very different problem from having a semantics that admits a well-defined set of interpretations.

Determinism implies that the same input will result in the same output.

I agree with you that for a single C program there’s a set of assembly code that satisfies it. But by choosing a compiler, an architecture and a set of flags,… you will always get the same assembly code. If you decide to randomize them, then you can no longer guarantee a specific result, but you can still guarantee sets of result. Which is the definition of non-determinism.

Formalism is orthogonal as its about having well defined sets and transformation. LLMs are formal because it’s a finite set of weights and tokens ad the operations are well defined. But the prompt -> tokens -> tokens -> code transformation is non- deterministic in most tools (claude, chatgpt). And the relation between the input and the output os a mathematical one, not a semantic one.

I see. Then we’re on the same page. My follow up question is: why do we care if the LLM is deterministic?

Hypothetically, if we could guarantee a semantic relationship between the input and output we wouldn’t care if the LLM was deterministic. For instance, if I give the LLM a lean theorem and it instantiates a program and a mechanical proof that the program conforms to the lean theorem, I just don’t care about determinism. Edit: this is equivalent to me not caring very much about which particular conformant C compiler I pick

And my understanding of LLMs is that they actually are functions and the observed randomness is an artifact of how we use them. If you had the weights and the hardware, you could run the frontier models deterministically. But I don’t think you’d be satisfied even if you could do that. Edit: this is maybe analogous to picking a particular C compiler that does not promise conformance

There are valid concerns with LLMs but I’m not convinced non-determinism is the thing we should care about.

So do you disagree with the parent that “code is just a spec as far as the C compiler is concerned”?

Maybe it’s important to agree on what a spec is. For instance, do you agree that a spec can be just as formal as code implementing the spec? For instance, if the spec is written in a machine parseable logic do you take that to be a spec? Or are you taking a spec to be written in natural language?

I suspect some of your disagreement with the parent is about this definition. I’m just trying to understand because for me it’s uncontroversial to claim C code is a spec for assembly code, but maybe the issue is that I am not thinking of specs in the colloquial way.