Hacker News new | ask | show | jobs
by azurezyq 1425 days ago
I can share mine. It's an ads retrieval system. Latency is very sensitive and it has to be efficient. To avoid mem allocations, special hashtables with fixed number of buckets (also open addressing) are used in multiple places in query processing. Default is 1000. However, there are cases that number of elements are only a handful. Then in this case, it fails to utilize the cache, hence slower.

The solution is to tune number of buckets from info derived from the pprof callgraph.

There were others too, like redundant serialization, etc. But this one is the most interesting.

2 comments

That's surprising. If I was writing this I'd have instrumented the code for the buckets to (optionally) log the use, and probably add an alert.

(being an armchair expert is easy though)

I also heavily used callgrind/cachegrind to tune critical paths in our high performance web proxy, we’re each micro/milliseconds counts… For example, in media type detection that is called multiple times per request (minimum twice for request/response), etc.