Hacker News new | ask | show | jobs
by cobbal 266 days ago
There are 4 important components to describing a compiler. The source language, the target language, and the meaning (semantics in compiler-speak) of both those languages.

We call a C->asm compiler "correct" if the meaning of every valid C program turns into an assembly program with equivalent meaning.

The reason LLMs don't work like other compilers is not that they're non-deterministic, it's that the source language is ambiguous.

LLMs can never be "correct" compilers, because there's no definite meaning assigned to english. Even if english had precise meaning, LLMs will never be able to accurately turn any arbitary english description into a C program.

Imagine how painful development would be if compilers produced incorrect assembly for 1% of all inputs.

1 comments

English does have precise meaning, if constructed to be precise, the issue is that LLMs do not assign meaning in the way humans assign meaning. Humans assign English meaning to code every day just fine, and sometimes it does result in bugs as well.

The LLM in this loop is the equivalent of a human, which also has ambiguous source language if we’re going by your theory of English being ambiguous. So it sounds like you’re saying that if a human produces a C program, it is not verifiable and testable because the human used an ambiguous source language?

I guess for some reason people thought I meant that the compiler would be LLM > machine code, where actually I meant the compiler would still be whatever language the LLM produces down to machine code. Its just that the language the LLM produces can be checked through things like TDD or a human, etc...