Hacker News new | ask | show | jobs
by pdkl95 3458 days ago
I believe[1] this isn't necessarily about broken files. There is a lot of variation allowed by the spec. One example that I've seen in the wild is extra-long (> 60 seconds) periods between I-frames. Seeking to an arbitrary point either requires searching backwards from the seek-point for an I-frame and storing a massive amount of RAM. As this usually isn't possible and would require decoding hundreds of frames, decoding may cheat and make do with as many P and B frames as it can handle.

[1] I haven't actually read most of the h.265 spec. It's possible these are technically invalid files.

1 comments

a 1-minute span for I-frames would not be prohibitive for parallel processing that the quotes part was referring to, with a 60-minute video it would still give you 60 segments to process in parallel.
A single uncompressed frame of 1080p video occupies 28MB in RAM, so 1 minute of 24fps video will take up 40GB. If you want to be able to run 4 cores at once it's 3 times that. You won't be doing that any time soon on your laptop or smartphone.
Curious as to your math? My naive thinking is 1920 * 1080 * 8 (generous) bytes is around 16MB.
I forgot where I got 28 from but it's indeed a mistake. For normal display you could get away with 1920 * 1080 * 8bit = 6MB. For a 10bit display it would be around 8MB. You do indeed often use 32bit float for high-quality processing but since what we're storing here is the output frame you would finish all that processing and then go down to 8 or 10bit per channel. So recalculating the math that's 8GB for 1 minute of video, still way too impractical.
I think the grandparent post is talking about decoding to RGB with a full 32-bit float per channel, which is 12 bytes per pixel rather than 8. The high precision is needed for HDR and for the extra processing you have to do to the pixels after they're decodeed - motion compensation, gamma correction, etc.
The maximum number of references frames, i.e. how much the Decoded Picture Buffer has to hold, is 16. So even if a GOP is 1 minute long you would have to hold at most 16 pictures in memory to have enough information to stream over that 1-minute segment.

So I still do not see how this would prohibit parallel processing.

Not sure how that would work. You have a thread that's decoding the frames 1 minute in front of where playback is, so if you're not decoding full frames and storing them until you need to display them what is that thread doing?
transcoding or video editing in slices is a common application.

you cut the video into a handful of parts at keyframes, process the parts individually in a streaming manner and then splice the partial results together.

If we're talking about playback then creating seeking-thumbnails could similarly benefit from parallel processing.