|
|
|
|
|
by unlord
591 days ago
|
|
It is actually quite a bit more misleading. I was not able to reproduce these numbers on Zen2 hardware, see https://people.videolan.org/~unlord/dav1d_6tap.png. I spoke with the slide author and he confirmed he was using an -O0 debug build of the checkasm binary. What's more, the C code is running an 8-tap filter where the SIMD for that function (in all of SSSE3, AVX2 and AVX512) is implemented as 6-tap. Last week I posted MR !1745 (https://code.videolan.org/videolan/dav1d/-/merge_requests/17...) which adds 6-tap to the C code and brings improved performance to all platforms dav1d supports. This, of course, also closes the gap in these numbers but is a more accurate representation of the speed-up from hand-written assembly. |
|