| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sanxiyn 1158 days ago
	AlphaZero in fact improves based on its own output, but I agree it is a special case and probably not generalizable.

1 comments

Buttons840 1158 days ago

It's RL though. Its output comes, in part, from interaction with an environment. It also has a well defined objective (win games). GTP doesn't have a clear objective other than "do more of this".

link