|
|
|
|
|
by pizlonator
713 days ago
|
|
No. The key observation of BOLT is that by collecting profiling on an optimized binary and then mapping the profiling onto a compiler's decompilation of that optimized binary, you get better profiling fidelity. My intuition for why BOLT works is that: - If you try to profile an unoptimized (or even insufficiently optimized) binary, you don't get accurate profiling because the timings are different. - If you try to profile an optimized binary and then rerun the compiler from source using that profiling data, then you'll have a bad time mapping the profiler's observations back to what the source looked like. This is because the compiler pipeline will have done many transforms - totally changing control flow layout in some cases - that make some of the profiling meaningless when you try to inject it before those optimizations happened. But BOLT injects the profiling data into the code exactly as it was at time of profiling, i.e. the binary itself. It's totally insane, wacky, and super fucking cool - these folks should be hella proud of themselves. |
|
Maybe the bigger problem is at what point do the profiles feed back. Since a compiler may generate many object files which are then linked to form the final binary you'd sort of maybe want to do this in the linker vs. earlier on.
I guess specifically with the kernel there's an extra layer of complexity. It looks like they use `perf` to record the profile which is cool. And then they apply the results to the binary which is also cool.