Hacker News new | ask | show | jobs
by justinlaster 3467 days ago
> a H.264 hardware decoder can decode all H.264 files

and

> video decoding is easily parallelizable

At a previous job, I don't know if it was just the field I was in or just bad luck, but having to explain this over and over again was kind of a personal nightmare.

That being said, this is an excellent list!

1 comments

Curious - Why is this? Does this assume streaming video, and you can't look ahead in the stream?

If you can jump ahead, it would seem to be easy to have multiple threads, starting at key frames to decode the content. You'd have to splice them together, but this seems possible.

> it would seem to be easy to have multiple threads, starting at key frames to decode the content.

It's a resource issue (memory, cpu, etc; and meeting latency requirements between those constraints), versus the subtly different standards "H.264" hardware and software follow, as well as a few other intricacies with how the whole standard works anyways. Again, it's not that it can't be done, but as the article says it can't be done easily or at least in certain situations done consistently.

Key frames are a good anchor around anything you're doing with H264 (and other formats), but it's not the end all and be all -- and they may even cause you trouble if you "trust" them too much. It is perhaps a bit like date time programming. You can create something fairly easily that works for a decent amount of time, and even if it ends up being incorrect your clients may not even notice... or it may breakdown in a catastrophic manner in the future. But doing the latter is certainly not correct and it's certainly not professional. Quite honestly, I'd say date time programming looks like a dream compared to the inconsistent nightmare that is video programming. Date/time logic needs to be sound because many programs rely on consistent and sane output from a program perspective, where as video programming gets to slide as long as the output is generally correct from a human visual perspective.

It's been a few years since I've dived into this stuff, so some things may have changed/gotten cleaned up. But the article seems to indicate that the ecosystem hasn't really changed.

1) You are now assuming that "seeking to a position will produce the same output as decoding to a position"; even if the video is well-formed (and you don't end up with massive issues where the key frames just don't work correctly) you are likely going to end up with subtle discontinuities between every segment. 2) You are now going to have to be buffering a couple seconds worth of uncompressed video somewhere, probably not on the GPU, leading to a much higher I/O bandwidth requirement somewhere that isn't good at that, so this is only probably going to be sort of parallel (FWIW, I believe most people who try to do parallel video decoding are assuming that they can have different parts of the encoder concentrate on different sections of the screen, which sounds good until you see how non-local video decoding can be).
> 1) You are now assuming that "seeking to a position will produce the same output as decoding to a position"; even if the video is well-formed (and you don't end up with massive issues where the key frames just don't work correctly) you are likely going to end up with subtle discontinuities between every segment.

Wouldn't "the keyframes just don't work correctly" result in corrupted output anyway?

If we're worrying about already-broken situations then it is quite obvious that additional breakage may occur in related features.

I think the point is that video definitely is that broken and the only reason video does work is because everyone has work-arounds for everyone else's bugs. At least that's my experience with video. It's all a disaster.
Yes, this. Working with video is as though there were no such thing as a documented API or standards document, but instead, you find the longest-lived bugs in the popular toolchains and in the clients of your customers, and those bugs are the foundation of the interfaces you implement.
I believe[1] this isn't necessarily about broken files. There is a lot of variation allowed by the spec. One example that I've seen in the wild is extra-long (> 60 seconds) periods between I-frames. Seeking to an arbitrary point either requires searching backwards from the seek-point for an I-frame and storing a massive amount of RAM. As this usually isn't possible and would require decoding hundreds of frames, decoding may cheat and make do with as many P and B frames as it can handle.

[1] I haven't actually read most of the h.265 spec. It's possible these are technically invalid files.

a 1-minute span for I-frames would not be prohibitive for parallel processing that the quotes part was referring to, with a 60-minute video it would still give you 60 segments to process in parallel.
Many of the listed points in TFA are not about broken-ness. A good chunk cover rarely-used features or less commonly used codecs for advanced applications.
As an example, there exist bitstreams where there aren't actually any keyframes, but instead the encoder guarantees that the decoder output converges to correct after decoding some number of frames. It's actually kinda how MDCT audio codecs work; it's just very rare in video.
seems to be easy, but each frame depends on previous frames... so now you need to share lots of data between threads. its not as embarrassingly parallel as it looks from a naive perspective.

although i contend that most decoders are very threadable - just that the people trying to do it usually lack the time or the skill, more usually the former.

the state of video in programming is a total mess from my experiences.

Decoding frames ahead of time gives no benifit to a user watching the video. The problem is how to decode a single frame in parallel. Contrary to the video expressed elsewhere, hardware decoders run a lot in parallel. As MultiCoreWare pointed out, one of the biggest challenges is latency.