| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jesse__ 329 days ago

There's a lot to discuss.

First off, a number of statements are nonsense. Take, for example

> you shouldn't be writing SIMD instructions directly unless you're writing a SIMD library or an optimizing compiler.

Why would writing an optimizing compiler qualify as territory for directly writing SIMD code, but anything else is off the table? That makes no sense at all.

Furthermore, I was writing a library. It's just embedded in my game engine.

> Instead you should reach for one of the many available libraries

This blanket statement is only true in a narrow set of circumstances. In my mind, it requires that you ship on multiple architectures and probably multiple compilers. If you have narrower constraints, it's extremely easy to write your own wrappers (like I did) and not take a dependency. A good trade IMO. Furthermore, someone's got to write the libraries, so doing it yourself as a learning exercise has value.

> There are loads of libraries like this [...] and provide targeting for a vast trove of SIMD options without hand-writing for every option.

The original commentor seems to be under the impression that using a SIMD library would somehow have produced a better result. The fact is, the library code is super fucking boring. I barely mentioned it in the article because it's basically just boilerplate an LLM could probably spit out, first try. The interesting part of the series is the observation that you can precompute a matrix of intermediates and look them up, instead of recomputing them in the hot loop, effectively trading memory bandwidth for less instructions. A good trade for this algorithm, which saturates the instruction pipelines.

The thing the original commentor does get right is the notion that thinking about data layout is important. But, that has nothing to do with the library you're using .. you just have to do it. They seem to be conflating the use of a library with the act of writing wide code, as if you can't do one without the other, which is obviously false.

> I was going to quickly rewrite the example in Highway ..

Right. I'll believe this when I see it.

I could pick it apart more, but.. I think you get my drift.

2 comments

janwas 329 days ago

Thanks for expanding on your viewpoint.

> Why would writing an optimizing compiler qualify as territory for directly writing SIMD code, but anything else is off the table?

I understood "directly writing" to mean assembly or even intrinsics. In general, I would advise not touching intrinsics directly, because the intrinsic definitions themselves have in several cases had bugs. Here's one AVX-512 example: https://github.com/google/highway/commit/7da2b760c012db04103....

When using a wrapper library, these can be fixed in one spot, but every direct user of intrinsics has to deal with it themselves.

> it's extremely easy to write your own wrappers (like I did) and not take a dependency. A good trade IMO

I understand wanting to reduce dependencies. The tradeoff is a bit more complex: for example many readers would be familiar with Highway terminology. We have also made efforts to be a lightweight dependency :)

> doing it yourself as a learning exercise has value.

Understandable :) Though it's a bit regrettable to tie your user code to the library prototype - if used elsewhere, it would probably have to be ported.

> The fact is, the library code is super fucking boring.

True for many ops. However, emulating AES or other complex ops is nontrivial. And it is easy to underestimate the sheer toil of keeping things working across compiler versions and their many bugs. We recently hit the 3000 commit mark in Highway :)

link

jesse__ 328 days ago

Generally agree, especially with the sentiment that it's a huge PITA maintaining something like this across multiple compilers & platforms.

Out of curiosity, does highway implement integer divide?

link

janwas 328 days ago

:) Yes indeed, it's about 500 LOC in https://github.com/google/highway/blob/master/hwy/ops/generi....

link

jesse__ 328 days ago

Nice. Looks like it handles quite a bit. I just supported a single div op, which was enough for my needs.

https://github.com/scallyw4g/bonsai_stdlib/blob/71fadd0f1fce...

link

llm_nerd 329 days ago

>First off, a number of statements are nonsense.

100% of my original comment is absolutely and completely correct. Indisputable correct.

>Furthermore, I was writing a library.

Little misunderstandings like this pervade your take.

>seems to be under the impression that using a SIMD library would somehow have produced a better result.

To be clear, I wasn't speaking to you or for your benefit, or specifically to your exercise. You'll notice I didn't email a list of recommendations to you, because I do not care what you do or how you do it. I didn't address my comment to you.

I -- and I was abundantly clear on this -- was speaking to the random reader who might be considering optimizing their code with some hand-crafted SIMD. That following the path in this (and an endless chain of similar) submission(s) is usually ill advised, generally, not even speaking to this specific project, but rather to the average "I want to take advantage of SIMD in my code" consideration.

HN has a fetish for SIMD code recently and there is almost always a better approach than hand-crafting some SSE3 calls in one's random project.

>The original commentor seems to be under the impression that using a SIMD library would somehow have produced a better result.

Again, I could not care less about your project. But the average developer does care that their code runs on a wide variety of platforms optimally. You don't, but again, you and your project was tangential to my comment which was general.

>The thing the original commentor does get right is the notion that thinking about data layout is important.

Aside from the entirety of my comment being correct, the point was that many of the SIMD tools and libraries force you down a path where you are coerced into such structures. Versus often relying upon the compiler to make the best of suboptimal structures. We've seen many times where people complain that their compiler isn't vectorizing things that they think it should, but there is a choice between endlessly fighting with the compiler, and hand-rolling SSE calls, that not only supports much more hardware it leads you down the path of best practices.

Which is of course why C++ 26 is getting std::simd.

Again, you are irrelevant to my comment. Your project is irrelevant to it. I know this is tough to stomach.

>Right. I'll believe this when I see it.

I actually cloned the project but then this submission fell off the front page and it seemed not worth my time. Not to mention that it can't be built on macOS which happened to be the machine I was on at the moment.

Because again, I don't care about your or your project, and my commentary was to the SIMD sideliners considering how to approach it.

>I could pick it apart more, but.. I think you get my drift.

None of your retorts are valid, and my comment stands as completely correct. The drift is that you feel defensive about a general comment because you did something different, which....eh.

link

janwas 329 days ago

I appreciate your efforts to nudge readers towards SoA data structures and varying SIMD widths. FWIW I have observed that advice is more effective if communicated with some kindness.

link

jesse__ 329 days ago

lol, alright dude. Good luck with C++26

link

llm_nerd 329 days ago

Delicious snark. Humorously I only mentioned C++26 because the approach is being formalized right into the standard -- it is so painfully obvious and necessary -- but of course I mentioned a number of existing excellent solutions like Highway already, so again you either have no idea what you're reading, or choose not to.

Cheers!

link