| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dcsommer 14 days ago

Codecs are difficult and expensive to develop. Therefore they get reused in many contexts, including security critical ones. Sandboxing is shown over and over to not be a great security solution, so what this means in practice is that security-critical software that needs software decoding get pwned because software engineers don't care to prioritize it in the first place.

Why shouldn't safety be the default? If you really want to, it wouldn't be too hard to maintain a patch on top of rustc to drop the bounds checks if you want to compile object files without them.

Software decoding has a safety culture problem, and we need to talk about it.

2 comments

cogman10 14 days ago

> Why shouldn't safety be the default?

Because safe code isn't fast enough to decode live video.

> If you really want to, it wouldn't be too hard to maintain a patch on top of rustc to drop the bounds checks if you want to compile object files without them.

Yeah, but then you are undermining safety in a critical way that does lead to security vulnerabilities (buffer overflow). And you are also now maintaining and requiring other devs for a project to use a custom version of rustc. That's certainly part of the reason that's simply not happened.

But another major part of it is that encoders end up with a lot of custom ASM regardless. That custom ASM is going to be where vulnerabilities end up. You don't really escape that by using rust.

If you are already abandoning where you critically need safety the most for performance, then why pick a language that additionally penalizes you for using unsafe constructs?

> Software decoding has a safety culture problem, and we need to talk about it.

Compilers and languages have an optimization problem that we need to talk about. SIMD optimizations remain a very hard thing for compilers to get right. We should talk about what it'd take to make compilers better and the reasons for why codec devs need to drop down to asm instead of using a high level compiler.

There might not be a solution to this problem, there are reasons for it.

link

Dylan16807 14 days ago

> Because safe code isn't fast enough to decode live video.

I strongly doubt that.

And if any implementation of AV2 can be "fast enough", then there should be no question at all that we can write "fast enough" safe decoders for every other codec. Absolutely no way safe code is inherently that much slower.

link

cogman10 14 days ago

Show me the AV1, H.265, or even H.264 decoder that doesn't ultimately rely heavily on hand written assembly to achieve "fast enough".

You can doubt all you like. Ultimately, there's a reason why dav1d includes hand coded SIMD for common platforms.

It's simply impossible to get a compiler to emit something like this [1].

[1] https://github.com/videolan/dav1d/blob/master/src/x86/ipred_...

link

Dylan16807 14 days ago

Is it simply impossible to get compiled code within a factor of five? That claim needs strong evidence.

More importantly, if you can show that your assembly code isn't altering pointers it shouldn't alter, and isn't going out of bounds on its reads, you're most of the way to having assembly in your verified safe code. And rough bounds checking with padding can as cheap as a bitmask.

link

cogman10 14 days ago

> Is it simply impossible to get compiled code within a factor of five? That claim needs strong evidence.

1. I didn't make that claim.

2. A negative assertion doesn't require evidence. If I say "this is impossible to do" the burden to disprove me is showing it's actually possible. You can't prove a negative. For example, if I say "the tooth fairy doesn't exist" I don't need to provide evidence of the tooth fairy's non-existence. If you disagree, you need to provide evidence to the contrary.

link

Dylan16807 13 days ago

> 1. I didn't make that claim.

Then you didn't read my previous comment correctly. AV2 must be "fast enough" if the designers aren't crazy. And AV2 is 5x slower than AV1. Therefore if compiled code is within a factor of five of hand-written assembly, it's "fast enough" for AV1, and h.264, and probably h.265 too.

You were disagreeing with my claim that other codecs could be "fast enough" with a safe compiler, right? If you weren't disagreeing, I don't know why you challenged me to show you some particular kind of code.

> 2. A negative assertion doesn't require evidence. If I say "this is impossible to do" the burden to disprove me is showing it's actually possible. You can't prove a negative. For example, if I say "the tooth fairy doesn't exist" I don't need to provide evidence of the tooth fairy's non-existence. If you disagree, you need to provide evidence to the contrary.

You're saying it's "simply impossible" for a compiler to optimize instructions to a certain level. But anything one person can code, another person can teach a compiler to do in similar situations. I don't need to show you an example, I just need to point you at the Church-Turing thesis and related documents.

link

imtringued 14 days ago

What's supposed to be the big source of unsafety in codecs though? Feels like the problem here is that C developers are ruining the reputation of C with their garbage code.

Bounds checking as a source of slowdown is overrated in a niche where you're working on fixed size blocks. It feels like the C developers are getting the parts outside the ASM kernels wrong.

link

cogman10 14 days ago

> What's supposed to be the big source of unsafety in codecs though?

Hand written assembly. It's quite easy to accidentally start reading or manipulating a block of memory you didn't intend to when doing complex SIMD transformations.

> Bounds checking as a source of slowdown is overrated in a niche where you're working on fixed size blocks.

I think you don't really understand how codecs work. It is not uncommon for a transformation like `a = b[c[i] * 3 + offset];`. There's no way for a compiler to omit the bounds check because it can't prove the contents of `c` aren't going to exceed the bounds of `b`.

This isn't a "crappy C developer" problem. This is a "There isn't a language that does a great job at capturing high level SIMD expressions" problem.

link