Hacker News new | ask | show | jobs
by janwas 37 days ago
In such discussions, whenever you mention abstractions are universally "pretty poor", to the extent anyone is listening, I think this hyperbole can do real damage. Maybe it prevents people from getting relevant performance gains, even if not 100% of the optimum, which is anyway unattainable. And what is the alternative? Not many projects can afford to hand write intrinsics for all platforms. And are you aware that Highway is basically a thin wrapper over intrinsics, which you can still drop down to where it helps?
3 comments

I am aware of Highway. It doesn’t add much value for the kind of SIMD code I write. I have better abstractions because I don’t have to consider portability nearly as much. Some useful constructions don’t have a good expression on weaker SIMD architectures.
To be clear, "better abstractions" here seems to mean macros for assembly language. To each their own.

What bothers me is advocating for this, or denigrating more generally useful alternatives, without mentioning the very narrow niche where this sits.

Video codecs only change every few years. This makes it more worthwhile/feasible to spend eng time on a few kernels.

Even then, not supporting SVE (you don't, right?) gives less incentive for the Arm CPU ecosystem to invest in it, helping keeping us stuck in the NEON local minimum. Not ideal :/

> 100% of the optimum, which is anyway unattainable.

Can you expand on this? Sounds like an interesting discussion.

:) I figure there is always something left to improve. For some kernels which really want to keep 30+ live registers, the compiler might not do as good a job as careful manual tuning, so intrinsics can have a bit of a cost. But I also figure optimization time is limited, so better to get 90% of several kernels rather than one to 99%.
Not who you asked but I think the meaning is that since intrinsics for simd are different in each platform, being able to have something that is portable and sometimes works faster is something, while writing for Intel, ARM and a zoo of instruction sets is not an option for some.
Besides Spolsky's law of leaky abstractions, "abstractions" can also result in "lowest common denominator" situations, which are the opposite of performance optimization. Talking negatively about abstractions is not what deals damage; you are shooting the messenger here. It's the abstractions themselves that deal damage when misplaced. "Zero-cost abstractions" is the true hyperbole.
Is this a good faith reply? The particular abstraction we built, and is being discussed, is manifestly and obviously not a lowest common denominator. Looks like you are deploying a second straw man, that of zero cost. In other comments here I acknowledge a cost to intrinsics.