| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hrydgard 4573 days ago
	ARM 32-bit did not have SIMD (vector) double precision, while ARM 64-bit does, so here it's definitely a win. On x86 though, both 32-bit and 64-bit did double precision vectors just fine, so it didn't really apply there (except that the fp register count was doubled).

1 comments

dragontamer 4573 days ago

Only if your code were SIMD aligned. But that is not code that your typical compiler outputs.

Most SIMD code is heavy number-crunching stuff like multimedia or GPU shaders. But much of that low-level handling is handled off CPU on phone platforms. It is simply more power efficient to have a hardware decoder of multimedia.

link

colanderman 4573 days ago

Only if your code were SIMD aligned. But that is not code that your typical compiler outputs.

GCC has supported autovectorization for a while now.

"Unoptimized code will be slow" isn't a great argument anyway. There's not much a processor can do to help that.

link

dragontamer 4573 days ago

Only with proper pragmas and loop constructs.

Besides, most ARM chips supported vectorized code anyway. You know, NEON? http://www.arm.com/products/processors/technologies/neon.php

ARM64 does NOT grant you the vectorized instruction advantage. Qualcomm Snapdragon Krait have supported NEON for some time already.

http://www.anandtech.com/show/5559/qualcomm-snapdragon-s4-kr...

So many people are arguing here, but clearly few of you people have even worked with ARM chips at the assembly level.

link

colanderman 4573 days ago

Only with proper pragmas and loop constructs.

(1) it's enabled by default at -O3; (2) the loop constructs needed are fairly simple; (3) arguing that unoptimized code will be slow is still a poor argument.

ARM64 does NOT grant you the vectorized instruction advantage.

32-bit ARM NEON does not support vectorized doubles. 64-bit ARM NEON does. Source: http://en.wikipedia.org/wiki/ARM_NEON#Advanced_SIMD_.28NEON....

So many people are arguing here, but clearly few of you people have even worked with ARM chips at the assembly level.

Yep. Thankfully I can back up my arguments with quoted facts.

EDIT: and I already granted that vectorized and floating-point operations don't necessarily benefit from larger register widths, so I don't know why you're even arguing. Let alone the OP wasn't even asking specifically about ARM!

link