|
|
|
|
|
by silentvoice
2435 days ago
|
|
Hi I wrote the blog post linked - and I feel a little silly that I didn't check that _both_ loops vectorized. So I fixed the Rust implementation to keep a running vector of partial sums which I finish up at the end - this one did vectorize. The result was a 2X performance bump, which I'm about to include in the blog post as an update. If it's OK I'll link to this comment as the inspiration. On the iterators versus loop: for some reason when I use the raw loop _nothing_ vectorizes, not even the obvious loop. What I read online was that bounds checking happens inside the loop body because Rust doesn't know where those indices are coming from. Using iterators instead is supposed to fix this, and it did seem to in my experiments. |
|
See the generated assembler here: https://rust.godbolt.org/z/G5A2u0