Hacker News new | ask | show | jobs
by carom 531 days ago
Most decompilers do not strive for recompilability. [1] I believe there are (or were) some academic projects that aimed for recompilation as a core feature, but it is a hard problem.

On the commercial side, IDA / HexRays [2] is very strong for C-like decompilation. If you're looking at Go, Rust, or even C++ it is going to be a little bit more messy. As other commenters have said, you'll work function-by-function and it is expensive, though the free version does have decompilation (F5) for x86 and x64 (IIRC).

Binary Ninja [3] (no affiliation) is the coolest IMO, they have multiple intermediate representations they lift the assembly through. So you get like assembly -> low level IL -> medium level IL -> high level IL. There are also SSA forms (static single assignment) that can aid in programmatic analyses. The high level IL is very readable but makes no effort to be compilable as a programming language. That being said, Binary Ninja has implemented different "views" on the HLIL so you can show it as pseudo-C, Rust, etc. There is a free online version and the commercial version is cheaper than IDA but still expensive. Good Python API, good UI.

Ghidra [4] is the RE framework released by NSA. It is free and open source. It supports a ton of niche architectures. This is what most people use. I think the UI is awful, personally. It has a decompiler, the results are OK. They have an intermediate representation (P-Code) and plugins are in Java (since it is written in Java). I haven't worked much with it.

Most online decompilations you see for old games are likely using Ghidra, some might be using IDA. This is largely a manual process of doing a function at a time and building up the mental map of the program and how things interact.

Also worth mentioning are lifters. There were a few projects that aimed to lift assembly to LLVM IR (compiler framework's intermediate representation), with the idea being that then all your analyses could be written over LLVM IR as a lingua franca. Since it is in LLVM IR, it would be also recompilable and retargetable. [5][6]

1. https://reverseengineering.stackexchange.com/questions/2603/...

2. https://hex-rays.com/ida-free

3. https://binary.ninja/free/

4. https://ghidra-sre.org/

5. https://github.com/avast/retdec

6. https://github.com/lifting-bits/mcsema

1 comments

Meta has a foundation model trained on LLVM IR: https://ai.meta.com/research/publications/meta-large-languag...
lol ok, now we’re getting into pure-nonsense territory
It's not clear to me why that is so. An LLM trained on IR for the purpose of compilation is not quite what we're looking for here but it is in the same territory.