|
|
|
|
|
by lovasoa
2424 days ago
|
|
It looks like what the author was looking for is [1] f64::mul_add(self, a: f64, b: f64) -> f64
Adding it to the code indeed allows the LLVM to generate the "vfma" instruction. But it didn't significantly improve performance, on my machine at least. $ ./iterators 1000
Normalized Average time = 0.0000000011943495282455513
sumb=89259.51980374461
$ ./mul_add 1000
Normalized Average time = 0.0000000011861410852805122
sumb=89259.52037960211
Maybe the performance gap is not due to what the author thought...[1] https://doc.rust-lang.org/std/primitive.f64.html#method.mul_... |
|