| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wavemode 610 days ago
	Yeah but, won't it also be learning from the mistakes and missed tactics too? (Assuming its reward function is telling it to predict the human's move, rather than actually trying to win.)