Hacker News new | ask | show | jobs
by a1369209993 1850 days ago
> you can hide instructions in there and make it intractable for a static analyzer to determine whether they are really instructions or just data.

Uh, no? This is very tractable - O(N) in the size of the binary - just check, for every single byte offset in executable memory, whether that offset, if jumped to or continued to from the previous instruction, would decode into a `msr s3_5_c15_c10_1, reg` or `mrs reg, s3_5_c15_c10_1` instruction.

IIUC, the decoding of a M1 ARM instruction doesn't depend on anything other than the instruction pointer, so you only need one pass, and you only need to decode one instruction, since the following instruction will occur at a later byte address.

Edit: unless its executable section isn't read-only, in which case static analyzers can't prove much of anything with any real confidence.

1 comments

Yes but if program constants are in executable memory, then you can end up with byte sequences that represent numeric values but also happen to decode into the problematic instructions.

For example, this benign line of code would trip a static analyzer looking for `msr s3_5_c15_c10_1, x15` in the way you described:

  uint32_t x = 0xd51dfa2f;
I said false positives are an issue in the context of a "dumb" real-time kernel-side scan. App Store submission is different. They can afford to have false positives and have a human look at them to see if they look suspicious.

There are 26 fixed bits in the problem instructions, which means a false positive rate of one in 256MiB of uniformly distributed constant data (the false positive rate is, of course, zero for executable code, which is the majority of the text section of a binary). Constant data is not uniformly distributed. So, in practice, I expect this to be a rather rare occurrence.

I just looked at some mac binaries, and it seems movk and constant section loads have largely superseded arm32 style inline constant pools. I still see some data in the text section, but it seems to mostly be offset tables before functions (not sure what it is, might have to do with stack unwinding), none of which seems like it could ever match the instruction encoding for that register. So in practice I don't think any of this will be a problem. It seems this was changed in gcc in 2015 [0], I assume LLVM does the same.

[0] https://gcc.gnu.org/pipermail/gcc-patches/2015-November/4334...

That makes sense. I'm glad to be wrong :-)