Y
Hacker News
new
|
ask
|
show
|
jobs
by
sanxiyn
1158 days ago
AlphaZero in fact improves based on its own output, but I agree it is a special case and probably not generalizable.
1 comments
Buttons840
1158 days ago
It's RL though. Its output comes, in part, from interaction with an environment. It also has a well defined objective (win games). GTP doesn't have a clear objective other than "do more of this".
link