|
|
|
|
|
by sillysaurusx
2355 days ago
|
|
It should be similarly efficient. AlphaZero used 1,000 TPUv1's to generate self-play games, and a much smaller number of TPUs to train the model on the previous self-play results. Whenever it generated a model that was >= 55% better, that became the new model. The same algorithm could be applied here. |
|