Hacker News new | ask | show | jobs
by rany_ 1126 days ago
This is, at its core a software distribution problem. Windows, Linux distros, etc. must have a way wherein the OS requests to download a more optimized version of the package.

I think this is most easily fixed for Linux distros, all it takes is for them to create a new architecture, say amd64_avx, which only contains packages with avx optimizations enabled where applicable.

2 comments

One complication is that AVX is not one but more than a dozen ISA extensions each of which may or may not be implemented on a particular processor. This means software delivered to a customer should ideally check CPUID at runtime to dispatch the appropriate processing kernel. https://en.m.wikipedia.org/wiki/AVX-512
If someone just says "AVX" they usually mean AVX(1). And that's what the article is discussing.

AVX-512 forking into tons of different beasts is a separate, but related, problem. But it's more like how SSE2, 3, 4, 4.1 and so on existed.

Sometimes people said "SSE" and they might have meant one the later versions, but I don't hear the same statement with AVX, since people very explicitly seem to say AVX2 and AVX512.

I also want to point out that there’s also the extreme-crazy option that is Gentoo linux.

All packages are distributed as source and compiled on the destination machine before being installed there. And yes, you can modify build flags per package to enable/disable compile flags.

I recommend doing it for fun. It’s a crazy world.

The article was referring to the original AVX1 not the subsequent variants. Likewise I was referring to the first set of extensions.
It also seems like Clickhouse could offer runtime detection of AVX and dispatch to the optimized functions in that case.
Yes, here is the article about the techniques: https://maksimkita.com/blog/cpu-dispatch-in-clickhouse.html
It’s considerably more onerous than just compiling to a single/multiple microarchitecture(s) though. Plus when you do this, you need to split out this code to be conditionally compiled so that you can support other architectures like ARM.
Here is an example on how to do this using github.com/google/highway: https://gcc.godbolt.org/z/zP7MYe9Yf

You write the code only once and do not have to worry about any #pragma/conditional compilation. Just copy-paste about a dozen lines of boilerplate, link with the Highway library, and done.

Disclosure: I am the main author; happy to discuss.

That’s great! Never seen this library before. It’s much neater than the other approaches I’ve seen/used.
Thanks, glad to hear :) Feedback is always welcome, do let us know via Github issue if there is anything you think can be improved.