Hacker News new | ask | show | jobs
Super human Stratego with RL and test time search (arxiv.org)
2 points by algo_trader 218 days ago
1 comments

Only 2000 GPU hours Heavily customized network 95% win rate in recent human tournament sample Several training techniques for evaluation/learning rate