| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by akireu 1597 days ago
	It looks promising! But fixed-width lanes don't seem too cross-platform? I don't just mean the v256 and v512 types that may become ubiquitous in a few years, but also things like optimizing for different L1 cache sizes, doing some operation macro-fusion on the SIMD unit, or directly supporting leading/trailing elements to reduce code size?

2 comments

kevingadd 1597 days ago

In practice it's not possible to optimize "generally" for all possible target architectures your wasm will run on. You're going to optimize for x86-64 or ARM, and probably going to specifically optimize for modern intel, modern amd, or apple's m1. If you try to optimize for everything you're going to run into really painful tradeoffs and probably have mediocre performance on a bunch of architectures after a lot of hard work.

link

skywal_l 1596 days ago

Wouldn't it be possible to have a binary containing multiple versions of your program compiled optimized for various CPU configuration and have a switch at runtime which would select depending on your CPUid. I think intel have a compiler for that.

link

janwas 1595 days ago

Yes, our github.com/google/highway does that for SSE4/AVX2/AVX-512. It targets at the level of instruction sets, though, not specific microarchitectures.

link

akireu 1597 days ago

Why not? Fixed-size SIMD architectures use mostly the same operations, so if you target SSE2 initially, the code should run just fine on NEON. A runtime that ships a JIT compiler also has the unique opportunity to further optimize SIMD code by using more lanes or limiting the working set to the host platform's L1 cache size. Even the AOT compilers like GCC or clang emulate platform-specific intrinsics using generic vector ones. This should count for something, no?

link

TinkersW 1597 days ago

They are similar but not the same, for instance SSE has movemask, but NEON does not, so it gets emulated(slowly) when targeting that platform. The cross lane ops are different enough that you might need to rewrite for other platforms. And then you run into situations where an instruction is very fast on one architecture but horribly slow on another because its basically emulated.

link

akireu 1597 days ago

This isn't really relevant to wasm, though. You can't expect it to support platform-specific hacks just for SIMD, so you'll have to make do with the lowest common denominator anyway.

link

hackthesystem 1597 days ago

Thanks! Good points, I think in general the fixed-width "packed" SIMD ISAs have the downsides that you mentioned.

But it seems that WebAssembly doesn't have length-agnostic SIMD instructions yet. There is an open proposal to add this though: https://github.com/WebAssembly/flexible-vectors

link