Hacker News new | ask | show | jobs
by netdevnet 613 days ago
Exactly. Level x (whatever scalar thing the user meant by that) doesn't quite work out for the reason you outlined. X Level Players have different tactics and someone that can use all of them will likely be better than most if not all those those players. I got downvoted for saying that. Maybe I didn't phrase it as well as you did
1 comments

Yeah but, won't it also be learning from the mistakes and missed tactics too? (Assuming its reward function is telling it to predict the human's move, rather than actually trying to win.)