Hacker News new | ask | show | jobs
by Vt71fcAqt7 508 days ago
>And AI--especially LLMs--are notoriously bad at the "correct" part of translation.

Can't you just compare the compiled binaries to see if they are the same? Is the issue that you don't have the full toolchain so there are different outputs from the two compilers? Thinking about it though you could probably figure out which compiler was used using those same differences though..

3 comments

It can take quite a bit of engineering just to get the same source to produce the same results in many C or C++ toolchains. "Reproducible builds" require work; all sorts of trivial things like the length of pathnames can perturb the results. Not to mention having to have the same optimizer flags.

"Do these two binaries always behave the same for the same inputs" is practically an unsolvable problem in general. You can get fairly close with something like AFL (American fuzzy lop, a fuzzer and also a type of rabbit).

(Someone should really make an LLM bot that scans HN for instances of "just" and explain why you can't just do that, it's such a red flag word)

The expected outcome of using a LLM to decompile is a binary that is so wildly different from the original that they cannot even be compared.

If you only make mistakes very rarely and in places that don't cause cascading analysis mistakes, you can recover. But if you keep making mistakes all over the place and vastly misjudge the structure of the program over and over, the entire output is garbage.

That makes sense. So it can work for small functions but not an entire codebase which is the goal. Does that sound correct? If so, is it useful for small functions (like, let's say I identify some sections of code I think are important becuase they modify some memory location) or is this not useful?
There are lots of parts of analysis that really matter for readability but aren't used as inputs to other analysis phases and thus mistakes are okay.

Things like function and variable names. Letting an LLM pick them would be perfectly fine, as long as you make sure the names are valid and not duplicates before outputting the final code.

Or if there are several ways to display some really weird control flow structures, letting an LLM pick which to do would be fine.

Same for deciding what code goes in which files and what the filenames should be.

Letting the LLM comment the code as it comes out would work too, as if the comments are misleading you can just ignore or remove them.

No, but for verifying equivalence you could use some symbolic approach that is provably correct. The LLM could help there by making its output verifiable.
Program equivalence is undecidable, in general, but also in practice (in my experience) most interesting cases quickly escalate to require an unreasonable amount of compute. Personally, I think it is easier to produce correct-by-construction decompilation by applying sequences of known-correct transformations, rather than trying to reconstruct correctness a posteriori. So perhaps the LLM could produce such sequence of transforms rather than outputting the final decompiled program only.
Yes, something like this, the intermediate verified steps wouldn't have to be shown to the user.