| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by firethief 1644 days ago

> The [NEON] Wasm SIMD implementation is 65% faster than native! But what is perhaps more interesting is that the Wasm scalar implementation is only half as fast as the Wasm SIMD version instead of the 3x seen on x86. Perhaps v8 doesn’t have enough optimizations on the Wasm SIMD to Neon front.

That's almost exactly what I'd expect from an optimal compiler.

Graviton2 has 3 scalar integer ALUs, and 2 128-bit. Scalar code can do 3 intops per cycle, x4 vector code can do 8. 8/3 is +67%. Intel processors have typically 4 scalar ALUs, and 3 vector units. 12/4 = 3x.

Zen has 4 units for 128-bit vectors, though until Zen3 not all units can do all operations, so the speedup in AMD land would be 2x-8x depending on application (although code doing brief 128-bit vector work would be limited by Zen having only 1 vector write port).