|
|
|
|
|
by a2code
819 days ago
|
|
I would devise a somewhat loose metric. Consider you assign a percentage as to how much a binary is disassembled. As in, 0% means the binary is in assembly and 100% means the whole binary is now C code. The ideal decompiler would result in 100% for any binary. My prediction is that this percentage will increase with time. It would be interesting to construct data for this metric. It is important to define the limitations of using LLMs for this endeavor. I would like to emphasize your subtle point. The compiler used for the original binary may not be the same as the one you use. The probability of this increases with time, as compilers improve or the platform on which the binary runs becomes obsolete. This is a problem for validation, as in you cannot directly compare original assembly code with assembly after compiling C code (that came from decompiling). Perhaps assembly routines could be given a likelihood, as in how sure the LLM is that some C code maps to assembly. Then, routines with hand-coded assembly would have a lower likelihood. |
|
The problem isn’t lifting to C code, but rather “good C code”. For example you can do a 1-to-1 translation on each assembly instruction to C code that will do the same Machine state changes. This is not usually why you want, as it comes with a lot of extra cruft. When people think “decompiler” they think of n output that looks like what they would personal write. But that’s very Ill-defined. And, personally idk how one would define such a thing.