| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cygaril 1691 days ago
	Standard RL algorithms will converge to optimal play versus a fixed opponent, but will not find an optimal policy via self play. One intuitive way to see this is that a sequence of improving pure policies A < B < C < etc. will converge to optimal play in a perfect information game like chess, but not necessarily in an imperfect information game like rock/paper/scissors where Rock < Paper < Scissors < Rock, etc