Y
Hacker News
new
|
ask
|
show
|
jobs
by
wavemode
610 days ago
Yeah but, won't it also be learning from the mistakes and missed tactics too? (Assuming its reward function is telling it to predict the human's move, rather than actually trying to win.)