Hacker News new | ask | show | jobs
by peterwaller-arm 1673 days ago
Author here. The executables shipped in manyclangs are release builds! The catch is that manyclangs stores object files pre-link. Executables are materialized by relinking after they are extracted with elfshaker.

The stored object files are compiled with -ffunction-sections and -fdata-sections, which ensures that insertions/deletions to the object file only have a local effect (they don't cause relative addresses to change across the whole binary).

As you observe, anything which causes significant non-local changes in the data you store is going to have a negative effect when it comes to compression ratio. This is why we don't store the original executables directly.

2 comments

Thank you for the explanation, so the pre-link storage is one of the magical ingredients, maybe mention this as well in the README?

Is this the reason why manyclang (using llvms cmake based build system) can be provided easily, but it would be more difficult for gcc? Or is the object -> binary dependency automatically deduced?

> maybe mention this as well in the README?

We've tweaked the readme, I hope it's clearer.

It would be great to provide this for gcc too. The project is new and we've just started out. I know less about gcc's build system and how hard it will be to apply these techniques there. It seems as though it should be possible though and I'd love to see it happen.

To infer the object->executable dependencies we currently read the compilation database and produce a stand-alone link.sh shell script, which gets packaged into each manyclangs snapshot.

Ah, the compilation database is where more magic originates from :)
Yes, this is less great than I would like! :( :)
Thanks. I had a use case in mind where LTO is enabled. Unfortunately the LTO step is quite expensive so relinking does not seem like a viable option. If I find some time I'll give it a try though.
ThinLTO can be pretty quick if you have enough cores, it might work. Not sure how well the LTO objects compress against each other when you have small changes to them. It might work reasonably.

manyclangs is optimized to provide you with a binary quickly. The binary is not necessarily itself optimized to be fast, because it's expected that a developer might want to access any version of it for the purposes of testing whether some input manifests a bug or has a particular codegen output. In that scenario, it's likely that the developer is able to reduce the size of the input such that the speed of the compiler itself is not terribly significant in the overall runtime. Therefore, I don't see LTO for manyclangs as such a significant win. But it is still hoped that the overall end-to-end runtime is good, and the binaries are optimized, just not with LTO.