Hacker News new | ask | show | jobs
by vaibhavsagar 2477 days ago
ARM NEON includes it as VCNT: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....
1 comments

Huh, it looks like that only works on 1-byte values? That’s an interesting choice.
Worse, it's a fertile ground for "interesting" bugs, because VADDV (which sum-reduces the result) reduces into an 8 bit uint. So if you e.g. accumulate two or more quadword VCNTs into a uint8x16_t and then VADDV it, you could end up with something other than the actual overall bit count (because 2 quadwords can have _256_ bits set). Same with accumulating 8 or more VADDVs, except now individual bytes could wrap around if you don't widen in between.