| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by confuseshrink 2172 days ago

Vectorization: I'm not an expert in this area so I can only tell you what I've personally found difficult in dealing with vectorization. Usually it all comes down to alignment and vector lanes. To utilize the vector instructions you basically have to paint your memory into separate (but interleaved) regions that can be mapped to distinct vector lanes efficiently. Everything is fine as long as no two elements from separate lanes have to be mixed in some way, as soon as your computation requires that you incur a heavy cost.

Dealing with these issues might require you to know the corners of the instruction set really well or some times the solution is outside of the instruction set and is related to how your data structure is laid out in memory leading you to AoS vs SoA analysis etc.

Compilers and vectorization: Based on reading a lot of assembly output I think what compilers usually struggle with are assumptions that the human programmer know hold for a given piece of code, but the compiler has no right to make. Some of this is basic alignment, gcc and clang have intrinsics for these. Some times it's related to the memory model of the programming language disallowing a load or a store at specific points.

GPGPU programmability: GPUs being easy to program is something I take with a grain of salt, yes it's easy to get up and running with CUDA. Making an _efficient_ CUDA program however is easily as challenging if not more than writing an efficient AVX program.