Hacker News new | ask | show | jobs
by saurik 3467 days ago
1) You are now assuming that "seeking to a position will produce the same output as decoding to a position"; even if the video is well-formed (and you don't end up with massive issues where the key frames just don't work correctly) you are likely going to end up with subtle discontinuities between every segment. 2) You are now going to have to be buffering a couple seconds worth of uncompressed video somewhere, probably not on the GPU, leading to a much higher I/O bandwidth requirement somewhere that isn't good at that, so this is only probably going to be sort of parallel (FWIW, I believe most people who try to do parallel video decoding are assuming that they can have different parts of the encoder concentrate on different sections of the screen, which sounds good until you see how non-local video decoding can be).
1 comments

> 1) You are now assuming that "seeking to a position will produce the same output as decoding to a position"; even if the video is well-formed (and you don't end up with massive issues where the key frames just don't work correctly) you are likely going to end up with subtle discontinuities between every segment.

Wouldn't "the keyframes just don't work correctly" result in corrupted output anyway?

If we're worrying about already-broken situations then it is quite obvious that additional breakage may occur in related features.

I think the point is that video definitely is that broken and the only reason video does work is because everyone has work-arounds for everyone else's bugs. At least that's my experience with video. It's all a disaster.
Yes, this. Working with video is as though there were no such thing as a documented API or standards document, but instead, you find the longest-lived bugs in the popular toolchains and in the clients of your customers, and those bugs are the foundation of the interfaces you implement.
I believe[1] this isn't necessarily about broken files. There is a lot of variation allowed by the spec. One example that I've seen in the wild is extra-long (> 60 seconds) periods between I-frames. Seeking to an arbitrary point either requires searching backwards from the seek-point for an I-frame and storing a massive amount of RAM. As this usually isn't possible and would require decoding hundreds of frames, decoding may cheat and make do with as many P and B frames as it can handle.

[1] I haven't actually read most of the h.265 spec. It's possible these are technically invalid files.

a 1-minute span for I-frames would not be prohibitive for parallel processing that the quotes part was referring to, with a 60-minute video it would still give you 60 segments to process in parallel.
A single uncompressed frame of 1080p video occupies 28MB in RAM, so 1 minute of 24fps video will take up 40GB. If you want to be able to run 4 cores at once it's 3 times that. You won't be doing that any time soon on your laptop or smartphone.
Curious as to your math? My naive thinking is 1920 * 1080 * 8 (generous) bytes is around 16MB.
The maximum number of references frames, i.e. how much the Decoded Picture Buffer has to hold, is 16. So even if a GOP is 1 minute long you would have to hold at most 16 pictures in memory to have enough information to stream over that 1-minute segment.

So I still do not see how this would prohibit parallel processing.

Many of the listed points in TFA are not about broken-ness. A good chunk cover rarely-used features or less commonly used codecs for advanced applications.
As an example, there exist bitstreams where there aren't actually any keyframes, but instead the encoder guarantees that the decoder output converges to correct after decoding some number of frames. It's actually kinda how MDCT audio codecs work; it's just very rare in video.