| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pron 34 days ago
	The experiment failed to produce a workable C compiler despite 1. the job not being particularly hard, 2. the available specs and tests are of a completely higher class of quality than almost any software, not to mention the availability of other implementations that the model trained on. You can call that a success (as it did something impresssive even though it failed to produce a workable C compiler) but my point in bringing this up was to show that today's models are not yet able to produce production software without close supervision, even when uncharacteristically good specs and hand-written tests exist.

2 comments

ianbutler 34 days ago

That's great and all, but that's not the point I was making and you're engaging rather uncharitably on it. So when you view it from the perspective of capability increase it's rather impressive. Note the slope of progress which this experiment was to show.

Edit: Maybe uncharitably is too strong, but we're talking past each other.

link

auggierose 34 days ago

pron made this statement:

> It's 2026 and the idea that even with detailed-enough requirements you can one-shot even a workable (let alone perfect) solution also needs to die.

and brought up the failed anthropic experiment as proof of that. Yes, you are talking past each other, but that is not pron's fault. It is your fault.

link

ianbutler 34 days ago

Eh fair enough!

link

KajMagnus 34 days ago

Saying the model failed to write a competitive C compiler makes more sense.

I don't think they tried to do that though.

> today's models are not yet able to produce production software without close supervision, even when uncharacteristically good specs and hand-written tests exist.

That's a good point anyway

link

pron 34 days ago

> Saying the model failed to write a competitive C compiler makes more sense.

Their compiler fails to compile (well, at least link) some C programs altogether, and in other cases it produces code that is 150,000x slower than a real C compiler with optimisations turned off (interestingly, the model trained on the real compiler's source code). That's not "not competitive" but "cannot be used in the real world". But even more importantly, the compiler cannot be fixed or evolved. It's bricked (at least as far as today's models' capabilities go). For any kind of software, not being able to improve or fix anything or add any new feature means it's effectively dead.

You could not use it in production even if no other C compiler existed.

link

jiggawatts 34 days ago

While I understand both points of view, I'm leaning towards yours, because:

- John Carmack embedded a C compiler and interpreter/runtime into Quake back in the mid 1990s as a scripting language! It was that efficient that it could be used in a real time 3D shooter. That's a solo effort as a minor component of a much larger piece of software.

- I've seen university CS courses hand out "implement a C compiler" as a homework / project exercise for students. It's not particularly difficult.

Sure, a modern C compiler like GCC has to handle inline assembly, various extensions, pragmas, intrinsics, etc... but like you said, all of those are thoroughly documented and have open source implementations to reference.

Similarly, the Rust compiler is implemented in Rust and could be used as an idiomatic reference for a generic compiler framework with input handling, parsing, intermediate representations, and so forth.

link

lmm 34 days ago

> Their compiler fails to compile (well, at least link) some C programs altogether, and in other cases it produces code that is 150,000x slower than a real C compiler with optimisations turned off

I would bet that those things are also true of at least one expensive commercial C compiler.

link

vajrabum 34 days ago

I'd love to hear of any currently available commerical C compiler which has that level of issues. I would bet you'll be hard pressed to find one. C compilation is a quite thoroughly solved problem. In any case please provide an example.

link