Hacker News new | ask | show | jobs
by danlark 1114 days ago
I am the author of this trick as well

You can read about it in https://community.arm.com/arm-community-blogs/b/infrastructu...

4 comments

Wow. I love your work. Thank you for coming here and talking about it. You could write Hacker's Delight 2nd edition for a new generation.
Hacker's Delight already has a 2nd edition.

https://www.oreilly.com/library/view/hackers-delight-second/...

would totally love to read a modern `Hacker's Delight`. My mind was so blown away the first time I learned about low-level optimizations. I wish I did more of that on a day to day
Let's add it to ClickHouse: https://github.com/ClickHouse/ClickHouse/blob/master/base/ba...

It should significantly improve the performance on ARM.

The VSHRN trick is nice (I used it only two hours ago!), but it really does feel like a crutch; I don't understand why they couldn't simply implement a PMOVMSKB-like instruction to begin with (it cannot possibly be very expensive in silicon, at least not if it moved into a vector register). One-bit-per-byte is really the sweet spot for almost any kind of text manipulation, and often requires less setup/post-fixup on either side of the POVMSKB/VSHRN.
> However, developers often encounter problems with Arm NEON instructions being expensive to move to scalar code and back.

I remember talking to an ARM engineer easily 10 years ago and he told us in that nice british accent: "You know, NEON is like 'back in the yard'" :-D. This has changed a lot, but not enough from what you wrote... Bit sad that these SIMD optimizations are still hand written...