| Author here. I knew that some people would react negatively to the term, but I can assure the intention is for you to have a better understanding of exactly how and when you should use Spice and Rayon. I would recommend reading the benchmark document: https://github.com/judofyr/spice/blob/main/bench/README.md. What people typically do when comparing parallel code is to only compare the sequential/baseline with a parallel version running at all threads (16). Let's use the numbers for Rayon that I got for the 100M case: - Sequential version: 7.48 ns. - Rayon: 1.64 ns. Then they go "For this problem Rayon showed a 4.5x speed-up, but uses 16 threads. Oh no, this is a bad fit." That's very true, but you don't learn anything from that. How can I use apply this knowledge to other types of problems? However, if you run the same benchmark on varying number of threads you learn something more interesting: The scheduler in Rayon is actually pretty good at giving work to separate threads, but the overall work execution mechanism has a ~15 ns overhead. Despite this being an utterly useless program we've learnt something that we can apply later on: Our smallest unit of work should probably be a bit bigger than ~7 ns before we reach for Rayon. (Unless it's more important for use to reduce overall latency at the cost of the throughput of the whole system.) In comparison, if you read the Rayon documentation they will not attempt to give you any number. They just say "Conceptually, calling join() is similar to spawning two threads, one executing each of the two closures. However, the implementation is quite different and incurs very low overhead": https://docs.rs/rayon/latest/rayon/fn.join.html. (Also: If I wanted to be misleading I would say "Spice is twice as fast as Rayon since it gets 10x speed-up compared to 4.5x speed-up") |
> Despite this being an utterly useless program we've learnt something that we can apply later on: Our smallest unit of work should probably be a bit bigger than ~7 ns before we reach for Rayon.
That's a very interesting project.
The big limitation I see with the current approach is that the usability of the library is much worth than what Rayon offers.
The true magic of Rayon is that you just replace `iter()` with `par_iter()` in you code and voilà! now you have a parallel execution. But yes it has some overhead, so maybe Rayon could try and implement this kind of scheduling as an alternative so that people pick what works best for their use-case.