| HN Mirror

The way to think of this is as part of Jeff Dean's "deep-learn all the cloud things!" thesis: https://www.gwern.net/Tool-AI#dean-2017

A cloud stack, from OS kernel settings to TCP/IP to database query optimizers to video codec settings to compiler settings, is made of thousands upon thousands of toggleable options, each of which is usually left at the default because no one on earth understands more than a small fraction of them, much less how to set them all appropriately for each task end-to-end. It's blackboxes on top of blackboxes all the way down. Collectively, inferior options could be giving up an incredible amount of performance. As has been demonstrated by experts in performance tuning, depending on how pessimal the defaults are, you could easily gain orders of magnitude performance by setting them to saner settings, much less truly optimal settings - these sorts of posts turn up routinely on HN, and even in very well-tuned cloud stacks, you have to figure that gains like >10% should be possible.

MuZero here shows that it can work for one piece of the stack. And MuZero is, by design, an insanely general architecture: handles two-player games like chess/Go & handles one-player like ALE, handles continuous action spaces (Sampled-MuZero), reasonably sample-efficient (because it learns an environment model, so using that more is MuZero-Reanalyzed), handles hidden information games against adversaries (Player of Games), and now OP shows self-play in a weird setting. (It still requires problem-specific input layers but even that can be lifted if you're willing to pay for Perceiver inputs which do arbitrary input modalities.)

So you can see the potential here for doing much more of cloud operations (beyond current applications like datacenter cooling control) with DRL agents. Plunk down a MuZero on your entire stack and assign it the goal of optimizing end-to-end for each specific task - DRL is expensive, but cloud-scale is even more so. Needless to say, don't expect any released checkpoints on Github...