| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jcranmer 2383 days ago

> I don't know much about web assembly, but x86, which is much more complicated with thousands of instructions, has been successfully reverse engineered basically since forever. There are decompilers that can automatically reconstruct source code in C or C++ from a binary blob.

That's a bit of an overstatement.

Disassembly of native executables is essentially a solved problem, and has been for decades. There is some variation in terms of how you define disassembly and how you deal with code that specifically tries to defeat disassembly, but it's solved enough that objdump -d is a decently effective tool.

Decompilation is more difficult. There were academic-quality decompilers by around the 90s, but these weren't really usable and tended to break on anything more complicated than toy examples. The JVM breathed new life into decompilers, and it's not until this point that you get decompilers that can routinely output code that is recompilable (and only in the Java domain).

In the mid-noughts, decompilation efforts returned to targeting native binaries again. This is helped by the developers of IDA Pro (the main tool used for reverse engineering) building a decompiler view into their application. There's also been more efforts on accurate static binary translation into IRs such as LLVM, which is often close enough to C to be effective, and I'm more familiar with these efforts than I am with full decompilers.

The creation of fully recompilable C source code from binaries is still a challenge, in part because machine semantics are more well-defined than C, and you basically have a tradeoff between readable output and semantically-correct (free of undefined behavior). Control-flow recovery is still challenging; signatures are needed to deal with statically-linked pieces of the standard library; and structure and type recovery is routinely of extremely poor quality.