Hacker News new | ask | show | jobs
by sigmoid10 44 days ago
We are way past "working compilation" when it comes to LLMs. They are already really good at writing readable, compliable code. The big problem with LLMs is making sure the output binary actually does what you wanted it to do. But if you define the goal not merely as instructions in a vague, unspecific human language and rather as recreating a given set of binary instructions after compilation, this big drawback goes away. So in a sense they are better suited for recompilation projects than for developing new applications.
1 comments

My point is that we have been past the "working compilation" way before LLMs, and I do not think anything in LLMs help with it, at best agents use these tools with the same efficiency. I disagree that they're good at writing compilable code, but agree on the readable part.
Which decompiler reliably produced working, high level C/C++ from assembly? I would have loved to use this thing you are describing here 15 years ago. Compilation is inherently lossy, so any system that could have given you this would have needed pretty heavy LLM-like features anyways.

>I disagree that they're good at writing compilable code

That was never part of the discussion, because as explained several times now it is irrelevant in this case. The existence of the original binary means all you need to do is match up things, which can be automated completely.

I do not understand what is it so hard to "generate working code". Even the free version of Hexrays was doing it 15 years ago, and I have written one in my company that I have used for over 30 years. It's actually ... trivial?

The problem is readability. No one in his right mind would call what they generate "C++". Mine still interjects assembler from time to time (and not the new version that GCC supports, but the older MSVC style) .

LLMs absolutely do not help with the generate working code part, because this is an exact problem that doesn't need nor benefit from an LLM (other than maybe automating stupid iteration?). They can help with the readability part, because here once you already have a working skeleton it doesn't matter that much if they make mistakes, as it is easy to detect.

I already asked, but I guess I'll need to ask again: Please show me this tool. Hex-rays is certainly the wrong answer, because the decompiled C code usually needs tons of manual cleaning, fixing datatypes and reconstructing function prototypes before you can compile. And even then you can't be sure about functional (much less binary) equivalence. If anything, all these traditional decompilers focused on readability, not recompilability. But even there they were much worse than LLMs.

If what you said was true, the projects mentioned above wouldn't have needed years of arduous work before the age of LLMs came to be.

I get the point, but note that (custom) datatypes and function prototypes are for readability. They are not required for working nor functionally-equivalent code.