Hacker News new | ask | show | jobs
by jasonthorsness 470 days ago
Soon I am wondering if rather than rely on finicky auto-vectorization we’ll just have LLMs help “hand-optimize” more routines. Just like how memcmp and memcpy are optimized by hand today maybe like 20% of the program could just be LLM-assisted assembly. @ffmpeg on X thinks maybe they are starting to get it [1] and I had some success having an LLM generate working WebAssembly [2] https://x.com/ffmpeg/status/1898408922769223994?s=46

https://www.jasonthorsness.com/24

1 comments

Instead of replacing one finicky and temperamental approach (auto-vectorizers) with another (LLM codegen) I'd much rather see more exploration of explicit SIMD abstractions like Intel's ISPC language. Shader languages for GPUs had this figured out forever ago, there is a sensible middle-ground to be had between brittle compiler magic and no compiler at all.
It does seem odd that languages or standard libraries haven’t embraced some of the most common SIMD instructions more than they have, after so many decades of the instructions being available in most processors. I’ve used dotnet’s Vector libraries a bit that tries to auto-adapt to register length and falls back to software on chips that don’t support hardware instructions; it can still be pretty unwieldy and sometimes you have to use the fixed-size ones anyway. Will take a look at ISPC.
ISPC is a step above the .NET Vector stuff, it's more or less a GPU shader language except it compiles down to SSE/AVX/NEON code instead. In fact I think it was originally envisioned to be a shader language for Intel's ill-fated Larrabee GPU, since that was just meant to be an extremely wide x86 chip.
Larrabee influenced ispc but was dead by that time. https://pharr.org/matt/blog/2018/04/18/ispc-origins
Indeed. .NET's SIMD primitives and platform intrinsics are more like the building blocks on top of which ISPC-like framework could've been built in .NET (more likely in F# since it's more flexible for libraries that want DSL-like customization).
As far as the question in the OP is concerned, ISPC wouldn't help at all, as it'd necessarily go to basically equivalent LLVM IR.