Y
Hacker News
new
|
ask
|
show
|
jobs
by
fallingfrog
3755 days ago
I think that's what the AlphaGo team did - they trained their agent against itself, and it learned new moves not explicitly programmed in! With an evaluation function just saying ahead / not ahead.