| There's a lot to discuss. First off, a number of statements are nonsense. Take, for example > you shouldn't be writing SIMD instructions directly unless you're writing a SIMD library or an optimizing compiler. Why would writing an optimizing compiler qualify as territory for directly writing SIMD code, but anything else is off the table? That makes no sense at all. Furthermore, I was writing a library. It's just embedded in my game engine. > Instead you should reach for one of the many available libraries This blanket statement is only true in a narrow set of circumstances. In my mind, it requires that you ship on multiple architectures and probably multiple compilers. If you have narrower constraints, it's extremely easy to write your own wrappers (like I did) and not take a dependency. A good trade IMO. Furthermore, someone's got to write the libraries, so doing it yourself as a learning exercise has value. > There are loads of libraries like this [...] and provide targeting for a vast trove of SIMD options without hand-writing for every option. The original commentor seems to be under the impression that using a SIMD library would somehow have produced a better result. The fact is, the library code is super fucking boring. I barely mentioned it in the article because it's basically just boilerplate an LLM could probably spit out, first try. The interesting part of the series is the observation that you can precompute a matrix of intermediates and look them up, instead of recomputing them in the hot loop, effectively trading memory bandwidth for less instructions. A good trade for this algorithm, which saturates the instruction pipelines. The thing the original commentor does get right is the notion that thinking about data layout is important. But, that has nothing to do with the library you're using .. you just have to do it. They seem to be conflating the use of a library with the act of writing wide code, as if you can't do one without the other, which is obviously false. > I was going to quickly rewrite the example in Highway .. Right. I'll believe this when I see it. I could pick it apart more, but.. I think you get my drift. |
> Why would writing an optimizing compiler qualify as territory for directly writing SIMD code, but anything else is off the table?
I understood "directly writing" to mean assembly or even intrinsics. In general, I would advise not touching intrinsics directly, because the intrinsic definitions themselves have in several cases had bugs. Here's one AVX-512 example: https://github.com/google/highway/commit/7da2b760c012db04103....
When using a wrapper library, these can be fixed in one spot, but every direct user of intrinsics has to deal with it themselves.
> it's extremely easy to write your own wrappers (like I did) and not take a dependency. A good trade IMO
I understand wanting to reduce dependencies. The tradeoff is a bit more complex: for example many readers would be familiar with Highway terminology. We have also made efforts to be a lightweight dependency :)
> doing it yourself as a learning exercise has value.
Understandable :) Though it's a bit regrettable to tie your user code to the library prototype - if used elsewhere, it would probably have to be ported.
> The fact is, the library code is super fucking boring.
True for many ops. However, emulating AES or other complex ops is nontrivial. And it is easy to underestimate the sheer toil of keeping things working across compiler versions and their many bugs. We recently hit the 3000 commit mark in Highway :)