| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pizlonator 713 days ago

No. The key observation of BOLT is that by collecting profiling on an optimized binary and then mapping the profiling onto a compiler's decompilation of that optimized binary, you get better profiling fidelity.

My intuition for why BOLT works is that:

- If you try to profile an unoptimized (or even insufficiently optimized) binary, you don't get accurate profiling because the timings are different.

- If you try to profile an optimized binary and then rerun the compiler from source using that profiling data, then you'll have a bad time mapping the profiler's observations back to what the source looked like. This is because the compiler pipeline will have done many transforms - totally changing control flow layout in some cases - that make some of the profiling meaningless when you try to inject it before those optimizations happened.

But BOLT injects the profiling data into the code exactly as it was at time of profiling, i.e. the binary itself.

It's totally insane, wacky, and super fucking cool - these folks should be hella proud of themselves.

2 comments

nsguy 713 days ago

In theory you can get a pretty good idea of where instructions came from in the source code though the optimizer does, shall we say, obfuscate/spread that a little bit (which is why debugging through optimized code or looking at core dumps from optimized code can be tricky- but you can still mostly do it, there's just some lack of precision in the mapping back).

Maybe the bigger problem is at what point do the profiles feed back. Since a compiler may generate many object files which are then linked to form the final binary you'd sort of maybe want to do this in the linker vs. earlier on.

I guess specifically with the kernel there's an extra layer of complexity. It looks like they use `perf` to record the profile which is cool. And then they apply the results to the binary which is also cool.

link

pizlonator 713 days ago

> In theory you can get a pretty good idea of where instructions came from in the source code though the optimizer does, shall we say, obfuscate/spread that a little bit (which is why debugging through optimized code or looking at core dumps from optimized code can be tricky- but you can still mostly do it, there's just some lack of precision in the mapping back).

I think the whole point of BOLT is that in practice, you can't get a good idea of where instructions came from.

And it's not even about instructions as much as control flow. LLVM, GCC, and other good compilers (like the ones I wrote for JSC) can and absolutely will fuck up the control flow graph for fun and profit. So if the point of the FDO is to create better code layout, then feeding the profiling samples into before when the compiler did its fuckery will put you up shit creek without a paddle: the basic blocks and branches that the profiler will be telling you about don't exist, and won't, until the compiler does its thing.

You could try to run the compiler forward until it recreates the control flow structure that the profiler is talking about, but that seems hella sketchy since at that point you're leaning on the compiler's determinism in a way that would make me (and probably others) uncomfortable. It would rule out running BOLT on binaries optimized with PGO and it would create super nasty requirements for build system maintainers.

link

muffa 713 days ago

I know very little about compilers or bolt.

But what you just described sounds awesome!(and crazy)

link