Hacker News new | ask | show | jobs
by Someone 390 days ago
Fun article, but the resulting code is extremely brittle:

- assumes x86_64

- makes the invalid assumption that functions get compiled into a contiguous range of bytes (I’m not aware of any compiler that violates that, but especially with profile-guided optimization or compilers that try to minimize program size, that may not be true, and there is nothing in the standard that guarantees it)

- assumes (as the article acknowledges) that “to determine the length of foo(), we added an empty function, bar(), that immediately follows foo(). By subtracting the address of bar() from foo() we can determine the length in bytes of foo().”. Even simple “all functions align at cache lines” slightly violates that, and I can see a compiler or a linker move the otherwise unused bar away from foo for various reasons.

- makes assumptions about the OS it is running on.

- makes assumptions about the instructions that its source code gets compiled into. For example, in the original example, a sufficiently smart compiler could compile

  void foo(void) {
    int i=0;
    i++;
    printf("i: %d\n", i);
  }
as

  void foo(void) {
    printf("1\n");
  }
or maybe even

  void foo(void) {
    puts("1");
  }
Changing compiler flags can already break this program.

Also, why does this example work without flushing the instruction cache after modifying the code?

4 comments

For the mainstream OSes (Windows, OSX, Linux Android) You don't need to flush the instruction cache on most x86 CPUs after modifying the code segment dynamically, but you do on ARM and MIPS.

This has burned me before while writing a binary packer for Android.

The author clearly explained that the whole article is more a demonstration for illustrative purposes than anything else.

> Changing compiler flags can already break this program.

That's not the point of the article.

They check all those assumptions by disassembling the code.
> self-modifying code > brittle

I mean that is to be very much expected, unless someone comes up with a programming language that fully embraces the concept.