Hacker News new | ask | show | jobs
by bertr4nd 2327 days ago
I wrote a little tool that used perf profiles from our production fleet to generate a custom linker script that reordered our main server program’s binary to be significantly more cache friendly. The heuristic I came up with for reordering was one of the few (maybe the only) genuine “eureka” moments I’ve had in my career.

And the performance win was extremely nice :-)

2 comments

For anyone curious about using something like this technique, Facebook has a similar tool.

https://github.com/facebookincubator/BOLT

Yes! In fact that’s from the same team I was working on at the time (HHVM). My tool predated BOLT by a few years, I think. Obviously BOLT is way more sophisticated and does an even better job.
Interesting, got a blog that goes into more detail?
We wrote this paper about it for CGO: https://research.fb.com/publications/optimizing-function-pla....

But the paper actually describes a significantly more sophisticated heuristic. My initial implementation simply used the number of perf samples divided by the size of the function, which helps make sure you’re getting the most out of your I-TLB. It worked shockingly well for its simplicity.