Hacker News new | ask | show | jobs
by Tobu 1680 days ago
The clever idea that makes manyclangs compress well is to store object files before they are linked, with each function and each variable in its own elf section so that changes are mostly local; addresses will indirect through sections and a change to one item won't cascade into moving every address.

I'm not sure the linking step they provide is deterministic/hermetic, if it is that would prove a decent way to compress the final binaries while shaving most of the compilation time. Maybe the manyclangs repo could store hashes of the linked binaries if so?

I'm not seeing any particular tricks done in elfshaker itself to enable this, the packfile system orders objects by size as a heuristic for grouping similar objects together and compresses everything (using zstd and parallel streams for, well, parallelism). Sorting by size seems to be part of the Git heuristic for delta packing: https://git-scm.com/docs/pack-heuristics

I'd like to see a comparison with Git and others listed here (same unlinked clang artifacts, compare packing and access): https://github.com/elfshaker/elfshaker/discussions/58#discus...

1 comments

Author here, I'd like to see such a comparison too actually, but I'm not in the position to do the work at the moment. We did some preliminary experiments at the beginning, but a lot changed over the course of the project and I don't know how well elfshaker fares ultimately against all the options out there. Some basic tests against git found that git is quite a bit slower (10s vs 100ms) during 'git add' and git checkout. Maybe that can be fixed with some tuning or finding appropriate options.
It would be interesting to compare to gitoxide tweaked to use zstd compression for packs.