Hacker News new | ask | show | jobs
by rapidlua 1933 days ago
I wonder if they do indeed compute a checksum of the binary to come up with a aot-translation cache key. Must be quite ineffficient.
2 comments

Not if your SSD delivers data at more than 2500MB/sec, as the one in the MacBook Air with M1 does. SHA256 calculation can probably be performed at that speed as well, thanks to dedicated silicon, so even a large 250MB binary (haven't seen those in any other use case than browsers) would be hashed in a tenth of a second. Not noticeable at all, if it's just once, at startup.
This also sounds like something that could be cached if the file doesn’t change.
I think there is no good way of doing that. Each time the user tries to run a x86_64 binary you’d have to actually checksum, or otherwise check the content of, the x86_64 binary to know if you have a translated version of it already.

inode meta data such as timestamps are insufficient I think. They can be tampered with.

> inode meta data such as timestamps are insufficient I think. They can be tampered with.

In macOS, there is a security-policy layer of some kind on top of xattrs, separate from the security-policy of the file itself. `com.apple.rootless` is an example of an xattr protected by this mechanism: users (even root) can't apply or remove `com.apple.rootless` from files on a filesystem mounted as the rootfs.

With this mechanism, it'd likely be possible to give executable binaries an xattr containing the checksum, generated by Gatekeeper+Rosetta, that the user couldn't modify, while still being able to otherwise modify/delete the file. (And, presumably, modifying the file would automatically invalidate/remove the checksum xattr.)

The kernel is aware when files are modified.
That's not generally applicable, e.g. not if files are on an external drive, or worse, a network filesystem (where they can change even during use).
In which case falling back on a hash is fine.
Is it? Checksums are something that I use rather than understand, but the CPU is doing billions of instructions per second these days and the hash only happens once
Do openssl speed sha256 to get the idea how high the latency for a cache hit would be. I see a throughput of ~300MiB/sec. This can be parallelised easily but still we are burning lots of CPU cycles for nothing. Bad for battery life.

https://gist.github.com/nonylene/d08977e8952c83d20d2c5f7cbaf...