I actually tried comparing 128-bit SIMD to the 64-bit performance and the difference was 2x. I only published the results for the 4x comparison, but it should be pretty easy to reproduce if you change the types in the non-SIMD code[1] from i32 -> i64.
[1] https://github.com/awelm/simd-wasm-profiling/blob/master/fil...