| HN Mirror

Only with proper pragmas and loop constructs.

(1) it's enabled by default at -O3; (2) the loop constructs needed are fairly simple; (3) arguing that unoptimized code will be slow is still a poor argument.

ARM64 does NOT grant you the vectorized instruction advantage.

32-bit ARM NEON does not support vectorized doubles. 64-bit ARM NEON does. Source: http://en.wikipedia.org/wiki/ARM_NEON#Advanced_SIMD_.28NEON....

So many people are arguing here, but clearly few of you people have even worked with ARM chips at the assembly level.

Yep. Thankfully I can back up my arguments with quoted facts.

EDIT: and I already granted that vectorized and floating-point operations don't necessarily benefit from larger register widths, so I don't know why you're even arguing. Let alone the OP wasn't even asking specifically about ARM!