Hacker News new | ask | show | jobs
by fearthetelomere 1284 days ago
>You should probably not use inline assembly in production code

What are the alternatives here? Write the assembly in a separate file and provide a FFI?

2 comments

Popular compilers support popular SIMD architectures through “intrinsic” functions. They look and act like regular functions, but they are built in to the compiler and usually compile to a single specific assembly instruction. In the article, _mm_set_epi32 is an intrinsic function that compiles to the instruction of the same name.

This is a sharp contrast to inline assembly for which the compiler has practically zero visibility into. Inline assembly can’t be pipelined with other work by the compiler. And, the compiler has to switch to a super-conservative assumption that the inline assembly might have done god-knows-what behind the compiler’s back.

AFAICT, the last holdouts for hand-written assembly are people working on media codecs. Even AAA game engines use intrinsic functions rarely and assembly nearly never.

Isn’t the reason they had to use inline assembly there because the compiler they’re using doesn’t have that particular instruction bound as an intrinsic?

What do you do in that case? I’m genuinely curious as it’s something I’ve run up against: the vector extensions for the LX7 processor in the ESP32-S3 don’t have intrinsics for them.

There are intrinsics for a wide range of ARM and PowerPC SIMD instructions, a huge range of Intel SIMD instructions and several useful instructions like ByteSwap or FindFirstSetBit on several architectures.

But, there is not an instrinsic for every instruction nor for useful instructions on every architecture. In those cases, you might be lucky to have the compiler recognize very specific patterns in C (compilers are great at recognizing C implementations of byteswap, for example). But, otherwise you’ll have to write inline assembly if you want to utilize those features.

You continue doing what you are doing.

Intrinsics in many languages are just a file full of inline ASM somewhere.

There are SIMD abstraction libraries floating around. And many so-called "Math" libraries will use SIMD instructions to speed things up, I believe. So the work is to cast the problem to the language of the library(ies) and do some profiling.