| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Calms 2699 days ago
	My understanding is that Alphastar is trained on reinforcement learning. I suspect this is unsupervised so it would have learned this behaviour independently without pro replays.

1 comments

w1 2699 days ago

Per DeepMind's blog post[1], an agent was initially trained via supervised learning on pro matches. Then, the agent was forked repeatedly, as the population of agents learned via tournament-style self-play. So, while initial strategies could have been seeded by pro play styles, the final models were the result of models learning from games with other models.

[1] https://deepmind.com/blog/alphastar-mastering-real-time-stra...

link

devilmoon 2699 days ago

Meaning that the final models evolved by playing against each other and they all started by using pro strategies, so, again, to me it seems kind of obvious that they would end up using pro strategies and in the best case just try to make them better

link