| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aengelke 864 days ago

Nice visualization of the ELF headers. However, the article has a few inaccuracies:

- ELF files are used not only for executables, but also object files, shared libraries, and also coredumps. Different parts of the ELF format serve different purposes, although there is a lot of overlap.

- The program headers don't state the location of .text, but indicate the area of the file that should be mapped into memory.

- Dynamic linking doesn't require section headers. The dynamic loader (ld.so) parses the program headers for a PT_DYNAMIC entry, which refers to the .dynamic section (which in turn refers to .dynsym, .dynstr, .rela.dyn, .init_array, etc.).

- Relocation sections (what is a relocation symbol?) are required for static linking, where every section with relocations gets its own relocation section, so .text gets .rela.text. Also, in object files, sections must use relocations to refer to other sections. Executables don't need to have relocations.

- The alignment of PT_LOAD segments must be at least the page size: the kernel or loader will use mmap to map the file, so alignments smaller than the page size won't work.

- The first section table entry must be of type SHT_NULL. The magic value SHN_UNDEF (=0) is used to refer to undefined symbols, so referring to the first section in, e.g., the symbol table, is not possible.

Although not required for a minimal file, any "modern" ELF executable should have a PT_GNU_STACK program header with flags read+write, otherwise the stack will get mapped as executable memory region, thereby creating a large and often avoidable attack vector.

1 comments

matheusmoreira 864 days ago

I'd like to add to your post.

> The program headers don't state the location of .text, but indicate the area of the file that should be mapped into memory.

Specifically, the PT_LOAD segment does that. Other segments are used for other purposes. Linkers generally don't generate ELFs with PT_LOAD segments covering the section header table but one could patch the ELF so that the last PT_LOAD segment covers the table or even the entire file. That way the location of the .text section becomes reachable to the running program via the section header table.

There's also this surprisingly useful PT_NULL segment type. They're essentially just placeholders with undefined program header structure contents. Excellent targets for patching. Scripting the linker to output these segments proved to be quite difficult so I just asked for a linker command line option instead. LLVM and GNU ld weren't interested but mold quickly added this feature.

A PT_NULL segment allows patching in a PT_LOAD segment for any data or metadata the programmer needs. It's also possible to create custom segments just like GNU did since there's a truly massive numeric range reserved just for that. These two facts enable some really cool stuff:

https://www.matheusmoreira.com/articles/self-contained-lone-...

> The alignment of PT_LOAD segments must be at least the page size: the kernel or loader will use mmap to map the file, so alignments smaller than the page size won't work.

In addition to that, they must also be sorted! For some weird reason, PT_LOAD segments cannot be in arbitrary order even if they don't overlap.

Violating these requirements causes some truly excruciating crashes. The executable would somehow segfault before a single instruction executed. This uber segfault brought the likes of GDB to its knees and I was reduced to pasting readelf output on stackoverflow.

link

aengelke 864 days ago

> That way the location of the .text section becomes reachable to the running program via the section header table.

Not sure why you would want to find .text in the program, but if you do, the linker (at least ld.bfd and ld.lld do) adds the symbols __executable_start and _etext, which surround the program code. Using linker-resolved symbols is much more reliable than parsing section headers.

> In addition to that, they must also be sorted! For some weird reason

For efficiency and simplicity when loading.

Re your article:

> Unless I can figure out a way to move the program header table to the end of the file without breaking everything

This should be doable, but you need to make sure that the program headers are mapped to memory (i.e., completely covered by a PT_LOAD segment).

I do wonder why you use program headers, though: depending on your exact use case, it might be easier to link against an object file containing only data (e.g., from objcopy) and use symbols instead.

link

matheusmoreira 863 days ago

> Not sure why you would want to find .text in the program

Not sure either, I just used the example I read in your post. Now I'm curious about why someone would want to do that. Maybe to make the section writable and patch the code at runtime?

> __executable_start and _etext, which surround the program code.

You're right! I did see those symbols when I dumped ld's default linker script. Completely forgot about them.

> This should be doable, but you need to make sure that the program headers are mapped to memory (i.e., completely covered by a PT_LOAD segment).

I'll keep this in mind when I try it again. At the time I got pretty frustrated because it was pretty hard to debug and figure out why it was failing. The mold solution was like a light at the end of the tunnel for me.

> I do wonder why you use program headers, though: depending on your exact use case, it might be easier to link against an object file containing only data (e.g., from objcopy) and use symbols instead.

Objcopy was ths first thing I tried! Even asked a question about it on stack overflow.

https://stackoverflow.com/q/77468641

Long story short, by default the sections aren't covered by a PT_LOAD segment and so they are unreachable. I wanted the program to work even if the symbols were not defined which is why I tried to find it in the table at runtime.

link