| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by astrange 1295 days ago
	Tell them to read the ffmpeg code. All the platform-specific/SIMD stuff is done in asm. This isn't only because it's faster, it's honestly easier to read than intrinsics anyway. What it does lack is debugability.

6 comments

MaxBarraclough 1295 days ago

Or any other highly optimised numerical codebase. From a quick glance at OpenBLAS, it looks like they have a lot of microarchitecture-specific assembly code, with dispatching code to pick out the appropriate implementations.

https://github.com/xianyi/OpenBLAS/blob/02ea3db8e720b0ffb3e2...

link

teux 1295 days ago

For debugging you can actually use gdb in assembly tui mode and step through the instructions! You can even get it hooked up in vs code and remote debug an embedded target using the full IDE. Full register view, watch registers for changes, breakpoints, step instruction to instruction.

Pipelining and optimisations can make the intrinsics a bit fucky though, have to make sure it’s -O0 and a proper debug compilation.

I have line by line debugged raw assembly many times. It’s just a pain to initially set up. Honestly not very different from c/c++ debugging once running.

link

astrange 1295 days ago

Sure, but gdb doesn't know what the function parameters are, or on some platforms where functions start and end, crashes don't have source lines, and ASan doesn't work. (though of course valgrind does)

link

bitwalker 1295 days ago

If you are handwriting the function in assembly, you'll know what registers hold the function parameters, what types of values they are supposed to be, and with care, you can produce debug information and CFI directives to allow for stack unwinding, it's just annoying to do - but that's just the tradeoff you make for the performance improvement I suppose.

link

variadix 1294 days ago

I don’t know if this is frowned upon or not among assembly programmers, but I often just use naked functions in C with asm bodies, which gdb will provide the args for, rather than linking against a separate assembly file.

link

saagarjha 1295 days ago

If you write your assembly to look like C code GDB is more than happy to provide you with much of that to the extent that it can. In particular, it will identify functions and source mappings from debug symbols.

link

saagarjha 1295 days ago

Pipelining and optimizations…when reading in the debugger? I don't quite understand how this is relevant.

link

wahern 1295 days ago

ffmpeg might have amazingly efficient inner loops (i.e. low-level decoding/encoding), but the broader architecture (e.g. memory buffer implementations, etc) is quite inefficient. Like the low-level media code it's not that each component itself is inefficient, it's that the interfaces and control flow semantics between them obstruct both compiler and architectural optimizations.

When I wrote a transcoding multimedia server I ended up writing my own framework and simply pulling in the low-level decoders/encoders, most of which are maintained as separate libraries. I ended up being able to push at least an order of magnitude more streams through the server than if I had used ffmpeg (more specifically, libavcodec) itself, even though I still effectively ended up with an abstraction layer intermediating encoder and format types. And I never wrote a single line of assembly.

There's no secret sauce to optimization: it's not about using assembly, fancier data structures, etc; it's learning to identify impedance mismatches, and those exist up and down the stack. Sometimes a "dumber" data structure or algorithm can create opportunities (for the developer, for the compiler) for more harmonious data and code flow. And impedance mismatches sometimes exist beyond the code--e.g. mismatch between functionality and technical capabilities, where your best might be to redefine the problem, which can often be done without significantly changing how users experience the end product.

link

astrange 1295 days ago

> most of which are maintained as separate libraries

This is so confusing I can’t tell if you’re actually talking about libavcodec. The whole point is to combine codecs to share common code, “most” decoders certainly aren’t available elsewhere.

If you just want to call libx264 directly go ahead and do that of course. libx264 uses assembly just as much or more than libavcodec though.

link

janwas 1295 days ago

I have a lot of sympathy for wanting efficient code. But let's indeed have a look: https://github.com/FFmpeg/FFmpeg/blob/7bbad32d5ab69cb52bc92a... There are so many macros, %if and clutter here that it's difficult (for me?) to keep the big picture in mind.

This reminds me of a retrospective of an OS/window manager written in assembly - they were great about avoiding tiny overheads, but expressed regret that the whole system ended up slow because it was hard to reason about bigger things such as how often to redraw everything, similar to what people are saying here.

To be clear: let's indeed optimize and vectorize, but better to build on intrinsics than go all the way down to assembly.

link

Const-me 1295 days ago

I prefer intrinsics over assembly.

There're too many different assemblies: inline, MASM, NASM, FASM, YASM. They come with their unique quirks, and they complicate build.

Intrinsics are more portable. It's trivial to re-compile legacy SSE intrinsics into AVX1. You won't automatically get 32-byte vectors this way, but you will get VEX encoding, broadcasts for _mm_set1_something, and more.

Readability depends on the code style. When you write intrinsics using "assembly with types" style, actual assembly is indeed more readable. OTOH, with C++ it's possible to make intrinsics way better than assembly: arithmetic operators instead of vaddpd/vsubpd/vmulpd/vdivpd, strongly-typed classes wrapping low-level vectors for specific use cases, etc.

Update: most real-life functions contain scalar code (like loops), also auto-generated code (stack frame setup, back up / restore of non-volatile registers). When coding non-inline assembly, developer needs to do that manually in assembly, this can be hard to do, and may cause bugs like these https://github.com/openssl/openssl/issues/12328 https://news.ycombinator.com/item?id=33705209

link

saagarjha 1295 days ago

FFmpeg code is god-awful. A lot of it is like from 2002 and written without regards to any sort of "sanity". People who write assembly routines these days have a structure to their code, and if they overrun buffers or whatever they'll document what alignment assumptions they're making. FFmpeg will just start patching its own code at runtime because someone thought it was a good idea on Pentium processors.

link