|
|
|
|
|
by kierank
810 days ago
|
|
v210_planar_pack_8_c: 2298.5 v210_planar_pack_8_ssse3: 402.5 v210_planar_pack_8_avx: 413.0 v210_planar_pack_8_avx2: 206.0 v210_planar_pack_8_avx512: 193.0 v210_planar_pack_8_avx512icl: 100.0 23x speedup. The compiler isn't going to come up with some of the trickery to make this function 23x faster. 800% is nothing. |
|
Based on the performance numbers, whoever was writing that test neglected to implement manual vectorization for the C version. Which is the only reason why assembly is 23x faster for that test. If they rework their C version with the focus on performance i.e. using SIMD intrinsics, pretty sure the performance difference between C and assembly versions gonna be very unimpressive, like couple percent.