| I'd like to add to your post. > The program headers don't state the location of .text, but indicate the area of the file that should be mapped into memory. Specifically, the PT_LOAD segment does that. Other segments are used for other purposes. Linkers generally don't generate ELFs with PT_LOAD segments covering the section header table but one could patch the ELF so that the last PT_LOAD segment covers the table or even the entire file. That way the location of the .text section becomes reachable to the running program via the section header table. There's also this surprisingly useful PT_NULL segment type. They're essentially just placeholders with undefined program header structure contents. Excellent targets for patching. Scripting the linker to output these segments proved to be quite difficult so I just asked for a linker command line option instead. LLVM and GNU ld weren't interested but mold quickly added this feature. A PT_NULL segment allows patching in a PT_LOAD segment for any data or metadata the programmer needs. It's also possible to create custom segments just like GNU did since there's a truly massive numeric range reserved just for that. These two facts enable some really cool stuff: https://www.matheusmoreira.com/articles/self-contained-lone-... > The alignment of PT_LOAD segments must be at least the page size: the kernel or loader will use mmap to map the file, so alignments smaller than the page size won't work. In addition to that, they must also be sorted! For some weird reason, PT_LOAD segments cannot be in arbitrary order even if they don't overlap. Violating these requirements causes some truly excruciating crashes. The executable would somehow segfault before a single instruction executed. This uber segfault brought the likes of GDB to its knees and I was reduced to pasting readelf output on stackoverflow. |
Not sure why you would want to find .text in the program, but if you do, the linker (at least ld.bfd and ld.lld do) adds the symbols __executable_start and _etext, which surround the program code. Using linker-resolved symbols is much more reliable than parsing section headers.
> In addition to that, they must also be sorted! For some weird reason
For efficiency and simplicity when loading.
Re your article:
> Unless I can figure out a way to move the program header table to the end of the file without breaking everything
This should be doable, but you need to make sure that the program headers are mapped to memory (i.e., completely covered by a PT_LOAD segment).
I do wonder why you use program headers, though: depending on your exact use case, it might be easier to link against an object file containing only data (e.g., from objcopy) and use symbols instead.