|
|
|
|
|
by brigade
2696 days ago
|
|
Scaling video encode to 112 CPU cores is hard. I haven't looked too hard into this encoder but the normal method to scale that high is to encode entire segments in parallel. (YouTube in particular supposedly does each segment single-threaded which is why libvpx has terrible scaling.) Which effectively means encoding up to 112 independent 4k streams. Each stream could need: - one source frame - additional source frames for reordering (3-7 is pretty
normal) - additional source frames for rate control (x264's default is 40) - recon for the frame being encoded - reference frames (IIRC AV1 allows up to 8 to be stored) Plus MVs, modes, maybe subpel caches, etc. That's easily 50-60 frames per stream. Times maybe 112 streams for 6000 frames. Easily tunable of course, especially with even a little intra-segment parallelism. |
|
From what I've seen AV1 breaks frames/segments up into a kd-tree and brute forces these leaves to find the transformation that looks the best with the smallest size. An over simplification obviously, but with everything that encoders are doing I still think it is naive to design them with such a simplistic view of concurrency that they have to be treated as a hundred small files for a hundred CPU cores.