Hacker News new | ask | show | jobs
by FiberBundle 2195 days ago
Does anybody know how long it would take to train an alphazero go version using one gpu? In [1] they claim that it took 13 hours until the model was able to beat the original alphago version, but they don't state what hardware they used.

[1] https://deepmind.com/blog/article/alphazero-shedding-new-lig...

4 comments

From an offline chat with the original author,

The ELF OpenGo paper[1], which is an open implementation of AlphaGo Zero developed by Facebook AI:

"First, we train a superhuman model for ELF OpenGo. Af-ter running our AlphaZero-style training software on 2,000GPUs for 9 days, our 20-block model has achieved super-human performance that is arguably comparable to the 20-block models described in Silver et al. (2017) and Silveret al. (2018)."

[1]: https://arxiv.org/pdf/1902.04522.pdf

I can’t find it now but iirc there was a blog post on HN about a month ago that estimated their training costs at $25 million, using many TPU pods.
Here was the guestimation: https://www.yuzeh.com/data/agz-cost.html
I agree with the quoted numbers. As I mentioned in another comment, you have to keep in mind that AlphaZero is an extremely sample-inefficient learning technique, even for simple problems. However, it has two major strengths: 1) it is pretty generic and 2) it can leverage huge amounts of computing power.
What would be an example of a more sample efficient algorithm?
That was with at least one or more tpu pods, iirc

https://cloud.google.com/tpu/docs/system-architecture