| HN Mirror

I wouldn't say that makes games orthogonal to real world problems. That's what makes them good stepping stones. Risk free "cheap" testing makes for fast research.

I totally agree about the ability to just skirt sample complexity. It's a tough one, made tougher by how early stage this work really is. We want bots to be able to match human ability and match human learning. Though they're put together, they're have very separate concerns.

For matching human ability, we're just beginning to learn techniques to get bots able to master hard tasks (e.g. incomplete information games, atari games, picking objects up). Those bots mostly learn waaaaaay slower than people. But never mastering is worse than slowly mastering, so it's early days.

On the other hand, you have people working on efficient learning. This is the question you're getting at with compute scaling arbitrarily-ish. It's more impressive if it can master a game after only playing it a small number of times. People are definitely working on this too, but for even simpler tasks. There's a lot of work right now in contextual bandits on learning fast, and that's a kind of baby-RL task. Even there, simulation tasks are super important because you really need a counterfactual to say whether you're doing well compared to alternatives.