|
|
|
|
|
by jcranmer
812 days ago
|
|
Here's my issue with decompilers: I don't want to look at assembly code. I'd rather see expression trees, expressed in C-like syntax, than trying to piece together variables from two-address or three-address instructions. Looking at assembly tends to lead to brain farts like "wait, was the first or second operand the output operand?" (really, fuck AT&T syntax) or "wait, does ja implement ugt or sgt?" So that means I want to look at something vaguely C-like. But the problem is that the C type system is too powerful for decompilers to robustly lift to, and the resulting code is generally at best filled with distractions of wait-I-can-fix-this excessive casting and at worst just wrong. And when it's wrong, I have to resort to staring at the assembly, which (for Ghidra at least) means throwing away a lot of the notes I've accumulated because they don't correlate back to underlying assembly. So what I really want isn't something that can emit recompilable C code, that's optimizing for something that doesn't help me in the end. What I want is robust decompilation to something that lets me ignore the assembly entirely. I'm a compiler writer, I can handle a language where integers aren't signed but the operands are. |
|
Our goal is: omit all the casts that can be omitted without changing the semantics according to C. In fact, we have a PR doing exactly this (still on the old repo, hopefully it will go in soon).
But, how can you expect to be able to be strict with what C allows you to do implicitly, if you're not even emitting valid C? For instance, thanks to the fact that we emit valid C, we could test if the assembly emitted by a compiler is the same before and after removing redundant casts.
My point is that emitting valid C is kind of a prerequisite for what you're asking, a rather low bar to pass, but that, in practice, no mainstream decompiler passes. It's pretty obvious the decompiled code will often be redundant and outright wrong if you don't even guarantee it's syntactically valid. Then clearly it's not a panacea, but it's an important design criterion and shows the direction we want to go.
As for comments: we still haven't implemented inline comments, but they will be attached to program addresses, so they will be available both in disassembly and decompiled C. It's not very hard to do, but that needs some love.