|
|
|
|
|
by fxtentacle
1655 days ago
|
|
This is a great result, but you can see that it's more of a theoretical case because of this: "converging to perfect play as available computation time and approximation capacity increases." That is true for pretty much all current deep reinforcement learning algorithms. The practical question is: How much computation do you need to get useful results? Alpha Go Zero is impressive mathematics, but who is willing to spend $1mio daily for months to train it? IMPALA (another Google one) can learn almost all Atari games, but you need a head node with 256 TPU cores and 1000+ evaluation workers to replicate the timings from the paper. |
|
Suppose you're a business that needs to play games. Most people seem to think that it's a matter of plugging in the settings from the paper, buying the same hardware, then clicking a button and waiting.
It's not. The specific settings matter a lot.
But my main point is that you'll get most of your performance pretty rapidly. The only reason to leave it running for so long is to get that last N%, which is nice for benchmarks but not for business.
DeepMind overspends. Actually, they don't; they're not paying anywhere close to the price of a 256 core TPU. (Many external companies aren't, either, and you can get a good deal by negotiating with the Cloud TPU team.)
But you don't need a 256 core TPU. Lots of times, these algorithms simply do not require the amount of compute that people throw at the problem.
On the other hand, you can also usually get access to that kind of compute. A 256 core TPU isn't beyond reach. I'm pretty sure I could create one right now. It's free, thanks to TFRC, and you yourself can apply (and be approved). I was. https://sites.research.google/trc/
It kills me that it's so hard to replicate these papers, which is most of the motivation for my comment here. Ultimately, you're right: "How much compute?" is a big unknown. But the lower bound is much lower than most people realize (and most researchers).