Hacker News new | ask | show | jobs
by 10000truths 1147 days ago
With the advent of LTO, unity builds are mostly a band-aid for poor management of header files. The Linux kernel project was able to net a ~40% reduction in compilation CPU-time just by pruning the contents of some key header files [1].

It really boils down to two rules:

1. Don't declare anything in header files that is only used in one compilation unit. Internal structs and functions should be declared and defined in source files, and internal linkage used wherever possible. gcc and clang's -fvisibility=hidden is useful here.

2. The more frequently a header file is included (whether transitively or directly), the more it should be split up. If a "common" or "utility" header file is included in 10000 source files, then any struct, function, etc. that you add to that file will have to be parsed 10000 times by the compiler every time you build from scratch, even if only 10 source files actually use the struct/function that you added. gcc and clang's -H flag is useful here.

[1] https://lore.kernel.org/lkml/YdIfz+LMewetSaEB@gmail.com/

2 comments

I think "just" is perhaps not the right word for something that took a senior dev over a year and more than 2000 commits just to get to an RFC patchset that doesn't compile for all architectures... Tremendous work, but it clearly wasn't easy or a matter of "follow these simple rules".
> unity builds are mostly a band-aid for poor management of header files

That's what its always was about (to improve build times), better optimization is just a welcome side effect. But header hygiene is hard because the problem will creep back into the code base over time.

> The Linux kernel project was able to net a ~40% reduction in compilation CPU-time

Linux is a C codebase. Header hygiene is much easier in C, because C headers usually only contain interface declarations (usually at most a few hundred lines of function prototypes and struct declarations), while C++ headers often need to include implementation code inside template functions, or are plastered with inline functions (which in turn means more dependencies to include in the header). And even if the user headers are reasonably 'clean', they still often need to include C++ stdlib headers which then indirectly introduce the same problem.

For instance your point (2) only makes sense if this header doesn't need to include any of the C++ stdlib headers, which will add tens of thousands of lines of code to each compilation unit. For such cases you might actually make the problem worse by splitting big headers into many smaller ones.

PS: the most effective, but also most radical and controversial solution is also a very simple one: don't include headers in headers.