Hacker News new | ask | show | jobs
by rbrown46 1135 days ago
I’ve gotten good insight into what takes up space in binaries by profiling with Bloaty McBloatface. My last profiling session showed that clang’s ThinLTO was inlining too aggressively in some cases, causing functions that should be tiny to be 75 kB+.

https://github.com/google/bloaty

3 comments

I spent a lot of time with bloaty for our embedded application and found I had more actionable output from something like this...

nm -B -l -r --size-sort --print-size -t d ./path/to/compiler/output{.so} | c++filt > /tmp/by_size

Just a lot of flags that show you size by symbol in decimal with unmangled symbols. Run it before you run `strip` in your CI pipeline or whatever preps a build for proper release.

I agree, bloaty seems to be good in giving a good (and quick) overview but the difficult part is drilling through the symbols to find out what the heck is happening. In that case nm/objdump/readelf are irreplaceable.
If you can run PGO, it will take the profiling information into account when doing inlining heuristics, which can help a lot in some cases. Technically that is general optimization for speed and not size, though, so if you really care specifically for binary size you'd probably still have to muck about with noinline attributes and such.
Unfortunately, PGO done the default way is antithetical to reproducible builds. You can avoid that by putting the profiling data in your VCS, but then you suffer of all the consequences of a version-controlled binary blob, one heavily dependent on other files at that.

Perhaps it should be possible use profiling data to keep human-managed {un,}likely or {hot,cold} annotations up to date? How valuable are PGO’s frequencies compared to these discrete-valued labels? (I know GCC allows you to specify frequencies in the source, but that sounds less than convenient.)

Bloaty is a nice tool.

When I worked on Matter a couple years ago, we had the problem that its backend http://www.capstone-engine.org/ did not support Xtensa, and produced some Python tools that could take output from bloaty or similar data from readelf or elftools, and produce several kinds of report.

https://github.com/project-chip/connectedhomeip/blob/master/...