Hacker News new | ask | show | jobs
by w1 2700 days ago
Per DeepMind's blog post[1], an agent was initially trained via supervised learning on pro matches. Then, the agent was forked repeatedly, as the population of agents learned via tournament-style self-play. So, while initial strategies could have been seeded by pro play styles, the final models were the result of models learning from games with other models.

[1] https://deepmind.com/blog/alphastar-mastering-real-time-stra...

1 comments

Meaning that the final models evolved by playing against each other and they all started by using pro strategies, so, again, to me it seems kind of obvious that they would end up using pro strategies and in the best case just try to make them better