|
|
|
|
|
by ulber
1574 days ago
|
|
I think you misunderstood the approach. MuZero is being used to optimize the choices made in the VP9 compression. In modern video encodings there's many ways to encode the same content. As a very simple example, you can vary how often you provide a full encoding of a frame and how often you encode differences between frames. Once this off-line optimization is done, the result is still a valid VP9 encoding, just a smaller one. MuZero is not needed for decompression at all. |
|
A cloud stack, from OS kernel settings to TCP/IP to database query optimizers to video codec settings to compiler settings, is made of thousands upon thousands of toggleable options, each of which is usually left at the default because no one on earth understands more than a small fraction of them, much less how to set them all appropriately for each task end-to-end. It's blackboxes on top of blackboxes all the way down. Collectively, inferior options could be giving up an incredible amount of performance. As has been demonstrated by experts in performance tuning, depending on how pessimal the defaults are, you could easily gain orders of magnitude performance by setting them to saner settings, much less truly optimal settings - these sorts of posts turn up routinely on HN, and even in very well-tuned cloud stacks, you have to figure that gains like >10% should be possible.
MuZero here shows that it can work for one piece of the stack. And MuZero is, by design, an insanely general architecture: handles two-player games like chess/Go & handles one-player like ALE, handles continuous action spaces (Sampled-MuZero), reasonably sample-efficient (because it learns an environment model, so using that more is MuZero-Reanalyzed), handles hidden information games against adversaries (Player of Games), and now OP shows self-play in a weird setting. (It still requires problem-specific input layers but even that can be lifted if you're willing to pay for Perceiver inputs which do arbitrary input modalities.)
So you can see the potential here for doing much more of cloud operations (beyond current applications like datacenter cooling control) with DRL agents. Plunk down a MuZero on your entire stack and assign it the goal of optimizing end-to-end for each specific task - DRL is expensive, but cloud-scale is even more so. Needless to say, don't expect any released checkpoints on Github...