|
|
|
|
|
by physicsguy
157 days ago
|
|
> every combination of compiler, optimization setting and platform I intend to support, disassemble all the resulting binaries, and analyze the disassembly to try to figure out if it did autovectorization in the way I expect I just used to fire up VTune and inspect the hot loops... typically if you care about this you're only really working on hardware targeting the latest instruction sets anyway in my experience. It's only if you're working on low level libraries I would bother doing intrinsics all over the place. For most consumer software you want to be able to fall back to some lowest-common-denominator hardware anyway otherwise people using it run into issues - same reason that Debian, Conda, etc. only go up to really old instructions sets. |
|