Hacker News new | ask | show | jobs
by jandrewrogers 4708 days ago
Both GCC and Clang are surprisingly mediocre at this kind of optimization. I write a lot of extreme performance integer algorithms and those compilers only seem to find "obvious" parallel instruction schedules about half the time even in isolated contexts.

Fortunately, it is pretty simple to induce the desired optimization from the C code without resorting to much cleverness. The compilers miss these optimizations often enough that I frequently double check if I care. Still, it requires fairly detailed knowledge of the microarchitecture.

I do not do microarchitecture optimization work very often. The last time I did, it was to design a faster, better hash function to replace Google's CityHash (and the result was faster and stronger). For most codes, memory behaviors dominate with respect to performance.

2 comments

> Both GCC and Clang are surprisingly mediocre at this kind of optimization.

Clang, at least, has gotten significantly better in the past year (I haven’t used GCC for a while), but there’s certainly room for improvement. Please report bugs for cases that your compiler misses if you can spare a few minutes.

Have you published your hash function?
Not yet but will soon. It is not just one function but an entire family of functions with some interesting aspects beyond just the algorithms.

The hash functions were algorithmically optimized around a scaffolding I designed that guaranteed certain performance characteristics and easy analyzability. It has literally produced many, many thousands of high quality hash functions. Tens of thousands of CPU hours have been burned on the optimization process, which is still running, and the ones that have been put into use so far were early samples pulled out of that process but the hash functions still being processed in the pipeline are statistically more robust than the earlier versions. Much easier than trying to design them the old fashioned way. At some point soon, since the optimization is converging, I will evaluate the most promising parts of the phase space to select the strongest and most aesthetic functions and publish those into the public domain.

how does it compare to, say, tabulation as in

Mihai Patrascu, Mikkel Thorup: The Power of Simple Tabulation Hashing. J. ACM 59(3): 14 (2012) http://doi.acm.org/10.1145/2220357.2220361 (free version: http://arxiv.org/abs/1011.5200)

or also

Mihai Patrascu, Mikkel Thorup: Twisted Tabulation Hashing. SODA 2013: 209-228 http://knowledgecenter.siam.org/0236-000005/

This is exactly what I was planning on doing one of these days. After looking at CityHash I was sure it could be beat.