| I did, you can find the Ghidra extension there: https://github.com/boricj/ghidra-delinker-extension The problem is properly identifying the relocations spots and their targets inside a Ghidra database, which is based on references. On x86 it's fairly easy because there's usually a 4-byte absolute or relative immediate operand within the instruction that carries the reference. On MIPS it's very hard because of split MIPS_HI16/MIPS_LO16 relocations and the actual reference can be hundreds of instructions away. So you need both instruction flow analysis strong enough to handle large functions and code built with optimizations, as well as pattern matching for the various possible instruction sequences, some of them overlapping and others looking like regular expressions in the case of accessing multi-dimensional arrays. All of that while trying to avoid algorithms with bad worst cases because it'll take too long to run on large functions (each ADDU instruction generates two paths to analyze because of the two source registers). Besides that, you're working on top of a Ghidra database mostly filled by Ghidra's analyzers, which aren't perfect. Incorrect data within that database, like constants mistaken for addresses, off-by-n references or missing references will lead to very exotic undefined behaviors by the delinked code unless cleaned up by hand. I have some diagnostics to help identify some of these cases, but it's very tricky. On top of that, the delinked object file doesn't have debugging symbols, so it's a challenge to figure out what's going wrong with a debugger when there's a failure in a program that uses it. It could be an immediate segmentation fault, or the program can work without crashing but with its execution flow incorrect or generating incorrect data as output. I've thought about generating DWARF or STABS debugging data from Ghidra's database, but it sounds like yet another rabbit hole. I'm on my fifth or sixth iteration of the MIPS analyzer, each one better than the previous one, but it's still choking on kilobytes-long functions. Also, I've only covered 32-bit x86 and MIPS on ELF for C code. The matrix of ISAs and object file formats (ELF, Mach-O, COFF, a.out, OMF...) is rather large. C++ or Fortran would require special considerations for COMMON sections (vtables, typeinfos, inline functions, default constructors/destructors, implicit template instantiations...). Also, you need to mend potentially incompatible ABIs together when you mix-and-match different platforms. This is why I think there's a thesis or two to be done here, the rabbit hole is really that deep once you start digging. Sorry for the walls of text, but without literature on this I'm forced to build up my explanations from basic principles just so that people have a chance of following along. |
But let me stick with some basics. So, I can write and compile an x86 test.c program. Then, I use your extension and undo the linking. Then, I use the results to link again into a new executable? Are the executables identical? When does it break?
How much of a task is it to make it a standalone program? What about x64 support?