Hacker News new | ask | show | jobs
by bloak 2719 days ago
Sometimes they will also make small changes to code that was generated earlier, typically by changing a branch instruction to point somewhere else.

Dynamic linkers may do that, too, though glibc doesn't normally do it, as far as I know: it prefers to update a pointer to code: same result without needing memory that is both writable and executable and without having to invalidate the instruction cache.

2 comments

Since the advent of ROP glibc's approach isn't much more secure, though it is saner. A few years ago OpenBSD added the kbind syscall which provides the linker equivalent of W^X. kbind is a kernel-mediated memcpy operation that restricts the code permitted to write to a memory block--specifically, the last piece of code to write to it, which is invariably the linker.

C++ virtual functions are problematic for the same reasons. In C code I've started to avoid function pointers altogether in favor of switch-based dispatch, limiting an attacker to invoking a small, statically defined set of functions, not any arbitrary code in the address space. If I feel the problem demands heavily polymorphic code I'll pull in a scripting language like Lua.

From my delve into such things, in Linux at least the linker is a binary called before the program runs, so its a separate program potentially modifying the code. Interesting philosophical question whether that's still self modifying though.
I appear to be getting downvoted.

Assuming its because you think I'm wrong about the separate program. Look up the manpage for ld.so .

If you run the strings program on a dynamically linked program the first thing it spits out should be the path to ld.so

If you run that program without arguments, it even gives you a usage message.

The OP's point, though, was that the glibc linker (ld.so) doesn't modify code; it largely limits itself to modifying tables of pointers. For access to shared symbols the compiler emits code that indirects through these tables, such as through the global offset table (GOT). The machine code itself is mostly generated to use relative offsets, so the linker doesn't usually need to change static opcode operands.

It does this because ELF is a newer, more abstract executable format. By contrast Windows and AIX have evolved an older dynamic linking strategy which depends more heavily on the linker patching address constants embedded in the code, presumably because of the better backward compatibility. I'm too young to have had first-hand experience with the details, but I do vaguely remember the Linux transition from a.out to ELF and it seemed rather disruptive (though it was all magic to me).

But the Windows approach isn't rightly self-modifying code, either. It's more like a delayed compilation stage. Self-modifying code implies code that rewrites itself dynamically during runtime. Runtime normally means in the normal course of regular program execution, as opposed to link time. From the perspective of the code, link time is a one- or two-time event--static linking and, optionally, dynamic linking--that initializes the application code prior to its first run.