(1) it's enabled by default at -O3; (2) the loop constructs needed are fairly simple; (3) arguing that unoptimized code will be slow is still a poor argument.
ARM64 does NOT grant you the vectorized instruction advantage.
So many people are arguing here, but clearly few of you people have even worked with ARM chips at the assembly level.
Yep. Thankfully I can back up my arguments with quoted facts.
EDIT: and I already granted that vectorized and floating-point operations don't necessarily benefit from larger register widths, so I don't know why you're even arguing. Let alone the OP wasn't even asking specifically about ARM!
(1) it's enabled by default at -O3; (2) the loop constructs needed are fairly simple; (3) arguing that unoptimized code will be slow is still a poor argument.
ARM64 does NOT grant you the vectorized instruction advantage.
32-bit ARM NEON does not support vectorized doubles. 64-bit ARM NEON does. Source: http://en.wikipedia.org/wiki/ARM_NEON#Advanced_SIMD_.28NEON....
So many people are arguing here, but clearly few of you people have even worked with ARM chips at the assembly level.
Yep. Thankfully I can back up my arguments with quoted facts.
EDIT: and I already granted that vectorized and floating-point operations don't necessarily benefit from larger register widths, so I don't know why you're even arguing. Let alone the OP wasn't even asking specifically about ARM!