|
|
|
|
|
by jbk
817 days ago
|
|
> Scalar loopy C code sure. The auto vectorization is not great. Stop considering people as idiots. People do that because it’s a LOT faster, not just a bit. If you are so able, please show us your results. Dav1d is full open source, fully documented, and with quite simple C code. Show your results. |
|
Not GP but here’s an example where intrinsics outperformed assembly by an order of magnitude: https://news.ycombinator.com/item?id=36624240
They were AVX2 SIMD intrinsics versus scalar assembly, but I doubt AVX2 assembly gonna substantially improve performance of my C++. The compiler did a decent job allocating these vector registers and the assembly code is not too bad, not much to improve.
It’s interesting how close your 800% to my 1000%. For this reason, I have a suspicion you tested the opposite, naïve C or C++ versus SIMD assembly. Or maybe you have tested automatically vectorized C or C++ code, automatic vectorizers often fail to deliver anything good.