Hacker News new | ask | show | jobs
by sanxiyn 1158 days ago
AlphaZero in fact improves based on its own output, but I agree it is a special case and probably not generalizable.
1 comments

It's RL though. Its output comes, in part, from interaction with an environment. It also has a well defined objective (win games). GTP doesn't have a clear objective other than "do more of this".