Hacker News new | ask | show | jobs
by tromp 2186 days ago
I did not see an evaluation of how close to perfection the agent becomes. Did you compute any sort of error rate (by finding moves that turn a won position into a non-won one or a drawn position into a lost one) ? And how this error rate drops over time as learning advances? That would indeed be very interesting to see.
2 comments

My team did an implementation of alpha zero connect four a couple of years ago. Our findings are in a series of blog posts starting at https://medium.com/oracledevs/lessons-from-implementing-alph.... We didn't manage to get to perfection either on policy, but got pretty close. You can play against some versions of the network here: https://azfour.com
Your series of blog articles has been an important source of inspiration in writing AlphaZero.jl and I cite it frequently in the documentation. Thanks to you and your team!
Such an evaluation is available in the tutorial: https://jonathan-laurent.github.io/AlphaZero.jl/dev/tutorial...

Admittedly, the connect four agent is still far from perfect but there is a lot of margin for improvement as I have done very little hyperparameters tuning so far.