Hacker News new | ask | show | jobs
by dyaroshev 527 days ago
Hi!

Thanks for your interest in the library.

Here is a godbolt example: https://godbolt.org/z/bEGd7Tnb3 Here is a bunch of simple examples: https://github.com/jfalcou/eve/blob/fb093a0553d25bb8114f1396...

I personally think we have the following strenghs:

* Algorithms. Writing SIMD loops is very hard. We give you a lot of ready to go loops. (find, search, remove, set_intersection to name a few). * zip and SOA support out of the box. * High quality codegen. I haven't seen other libraries care about unrolling/aligning data accesses - meanwhile these give you substantial improvements. * Supporting more than transform/reduce. We have really decent compress implemented for sse/avx/neon implemented for example.

The following weaknesses:

* We don't support runtime sized sve/rvv (only fixed size). We tried really hard, but unfortunately just the C++ language refuses to play ball there. Here is a discussion about that https://stackoverflow.com/questions/73210512/arm-sve-wrappin...

If this is something you need we recommend compiling a few dynamic libraries with the correct fixed lengths. Google Highway manage to pull it off but the trade off is a variadics interface that I personally find very difficult.

* Runtime dispatch based on arch.

We again recommend dlls for this. The problem here is ODR. I believe there is a solution based on preprocessor and namespaces I could use but it breaks as soon as modules become a thing. So - in the module world - we don't have an option. I'm happy for suggestions.

* No MSVC support

C++20 and MSVC is still not a thing enough. And each new version breaks something that was already working. Sad times.

* Just tricky to get started.

I don't know what to do about that. I'm happy to just write examples for people. If you wanna try a library - please create an issue/discussion or smth - I'm happy to take some time and try to solve your case.

We talked about the library at CppCon: https://youtu.be/WZGNCPBMInI?si=buFteQB1e1vXRT5M

If you want to learn how SIMD algorithms work, here are a couple of talks I gave: https://youtu.be/PHZRTv3erlA?si=b87DBYMDskvzYcq1 https://youtu.be/vGcH40rkLdA?si=WL2e5gYQ7pSie9bd

Feel free to ask any questions.

1 comments

> Google Highway manage to pull it off but the trade off is a variadics interface that I personally find very difficult.

I'm curious what you mean by 'variadics', and what exactly you find difficult?

People new to Highway are often surprised by the d/tag argument to loads that say whether to load half/full vector, or no more than 4 elements, etc. The key is to understand these are just zero-sized structs used for type information, and are not the actual vector/data. After that, I observe introductory workshop participants are able to get started/productive quickly.

I struggle to read the highway documentation, it focuses on things that are unrelated to me. So sorry if I'm wrong.

Let me write the std::ranges code and ask you to write them with highway.

https://godbolt.org/z/3s1b8P3sj

PS: this is how it looks in eve: https://godbolt.org/z/Kzxqqdrez

Thanks for sharing :) Any thoughts on what kind of things you are looking for and didn't find?

I cannot recall anyone saying this kind of thing is a bottleneck for them. We don't use std::range, but searching for a negative value can look like: https://gcc.godbolt.org/z/8bbb16Eea

It looks like smaller codegen than EVE's https://godbolt.org/z/fEn9r175v?

Thanks for this example.

Can you write the second one two? With two ranges? That's where I believe the variadics will be.

FYI: The codegen is smaller because the loop is not unrolled. That's a 2x slower on my measurements. + at least I don't see any aligning of memory accesses, that'd give you another third improment when the data is in L1. You really should fix that.

We have a different philosophy: not supporting/encouraging needlessly SIMD-hostile software. We assume users properly allocate their data, for example using the allocator we provide. It is easy to deal with 2K aliasing in the allocator, but much harder later. At least in my opinion, this seems like a better path than penalizing all users with unnecessary (re)alignment code.

We have not added a FindIf for two ranges because no one has yet requested that or mentioned it is time-critical for their use cases.

That definetly doesn't apply to unrolling.

In eve the ability to zip ranges is fundamental and is very important