| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pcwalton 4291 days ago
	Autovectorization has been an area of intense compiler effort for a decade or more and by and large the primary customers of it (games, video codecs, etc.) prefer the intrinsics. It's perceived as too unreliable and brittle to be relied upon, and it's easy to see why: given a choice between having to think about what the compiler's alias analysis, overflow analysis, loop trip count analysis, etc. will do and just writing an intrinsic and calling it a day, programmers will choose the latter. This applies regardless of how good the autovectorization really is: it's in a weird catch-22 kind of space where adding more and more features to your autovectorizer can actually reduce its perceived reliability, by making the answer to "will this vectorize?" harder and harder for a programmer to answer at a glance. <xmmintrin.h> has a lot of problems, but it's reliable, and at the end of the day that's what history has shown that game devs and video codec authors want.

6 comments

DannyBee 4291 days ago

"Autovectorization has been an area of intense compiler effort for a decade or more and by and large the primary customers of it (games, video codecs, etc.) prefer the intrinsics. It's perceived as too unreliable and brittle to be relied upon, and it's easy to see why: given a choice between having to think about what the compiler's alias analysis, overflow analysis, loop trip count analysis, etc. will do and just writing an intrinsic and calling it a day, programmers will choose the latter."

100% true for C++ (though it would be more accurate to say "4 decades" if you want to count fortran autovectorization, which has been going on since the late 70's)

But, i'll point out, plenty of the time, they end up writing slower intrinsics than the compilers autovectorization did to the same code.

(Plenty of the time they don't, too).

Additionally, all of the problems you mentioned are due to specific issues in C/C++. In other languages, autovectorization is not just "relied upon", it's basically "part of the standard" (see, e.g., Fortran 95).

Given that all of the brittleness you talk about is precisely because of the lack of pointer safety, alignment issues, and all sorts of things that simply only exist in C/C++, where programmers have a lot of control, i'm not sure it makes sense to base your argument on the experience of a language that is very different from the one this API was designed for.

All that said, truthfully, IMHO, neither autovectorization, nor intrinsics at the level you are talking, make for a good programming model in most languages.

The intrinsics at this level don't get used effectively: Among other reasons, they codegen differently on different platforms that don't directly have the exact same simd semantics, which is "all of them" :P

I know you guys are trying to avoid this by limiting the ops available/etc. It is, IMHO, a losing game.

So you end up with the same problem: People write loops that are really bad on some platforms, and good on others.

Autovectorization knows what the target looks like, but doesn't trigger in some cases people want it to.

In the end, I think doing things like Halide is a lot more useful as a programming model than simd.js

simd.js is a usable implementation mechanism for some of those programming models, but i would not sell it as the programming model itself.

In fact, almost the exact set of intrinsics mentioned in simd.js were allowed for generic operations on vectors in GCC (you can create a vector 32x4 float in a platform independent way, do normal ops on it, and it will codegen down to lower level vector ops, without ever seeing xmmintrin). It was simultaneously not high level and not low level enough.

People resorted to the lower level platform specific intrinsics to get better performance, or wrote higher level libraries to get better abstract.

In any case, i'm sure it's faster than what you have now, and certainly an advance. I'd just be careful of thinking it's going to work all that well except for targeted use cases.