Hacker News new | ask | show | jobs
by placebo 3313 days ago
I find it interesting that AlphaGo improves its play by playing against itself. I wonder what the limits of this are.
1 comments

In RL you have two modes, "explore" and "exploit". In explore mode it doesn't always select the best known move, instead it selects a promising move for which it has less experience. This is how the surprising new strategies are discovered, in self play there's no shame in losing.