|
|
|
|
|
by paulsutter
3317 days ago
|
|
In RL you have two modes, "explore" and "exploit". In explore mode it doesn't always select the best known move, instead it selects a promising move for which it has less experience. This is how the surprising new strategies are discovered, in self play there's no shame in losing. |
|