|
I 120% agree with what you're saying, but emitting valid C is kinda part of what you're asking, in design terms. Our goal is: omit all the casts that can be omitted without changing the semantics according to C. In fact, we have a PR doing exactly this (still on the old repo, hopefully it will go in soon). But, how can you expect to be able to be strict with what C allows you to do implicitly, if you're not even emitting valid C? For instance, thanks to the fact that we emit valid C, we could test if the assembly emitted by a compiler is the same before and after removing redundant casts. My point is that emitting valid C is kind of a prerequisite for what you're asking, a rather low bar to pass, but that, in practice, no mainstream decompiler passes. It's pretty obvious the decompiled code will often be redundant and outright wrong if you don't even guarantee it's syntactically valid. Then clearly it's not a panacea, but it's an important design criterion and shows the direction we want to go. As for comments: we still haven't implemented inline comments, but they will be attached to program addresses, so they will be available both in disassembly and decompiled C. It's not very hard to do, but that needs some love. |
In my experience with Ghidra, I've just seen far too many times where Ghidra starts with wrong types for something and the result becomes gibberish--even just plain dropping stuff altogether. There are some cases where it's clear it's just poor analysis on Ghidra's part (e.g., it doesn't seem to understand stack slot reuse, and memcpy-via-xmm is very confusing to it). And Ghidra's type system lacks function pointer types, which is very annoying when you're doing vtable-heavy C++ code.
I do like the appeal of a recompileable target language. But that language need not be C--in fact, I'm actually sketching out the design of such a language for my own purposes in being able to read LLVM IR without going crazy (which means I need to distinguish between, e.g., add nuw and just plain add).
Analysis necessarily involves multiple levels. Given that a lot of the type analysis today tends to be crap, I'd rather prefer to have the ability to see a more solid first-level analysis that does variable recovery and works out function calling conventions so that it can inform my ability to reverse engineer structures or things like "does this C++ method return a non-trivial struct that is an implicit first parameter?"
(Also, since I'm largely looking at C++ code in practice, I'd absolutely love to be able to import C++ header files to fill in known structure types.)