|
|
|
|
|
by w1
2700 days ago
|
|
Per DeepMind's blog post[1], an agent was initially trained via supervised learning on pro matches. Then, the agent was forked repeatedly, as the population of agents learned via tournament-style self-play. So, while initial strategies could have been seeded by pro play styles, the final models were the result of models learning from games with other models. [1] https://deepmind.com/blog/alphastar-mastering-real-time-stra... |
|