| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cgreerrun 2190 days ago

I'm trying to answer that right now, actually.

For connect 4, once it's trained a bit it seems to do really well. At 800 MCTS playouts (<.4s), it goes from "I can beat it and sometimes we draw" before training to "I pray for a draw, it almost always beats me" after ~4 hours of training.

Connect 4 is a solved game, so it should be possible to sample the space of the trillions of (position, who should win?, what are the best move(s)?) tuples and compare the answers to your value/policy models to get some kind of objective error. I haven't had time to do that, but having that benchmark is nice to have so you don't have to do a "ladder tournament" against some reference bot(s) like you do for Go where you don't know what ideal play is.

After training it for 10 hours on Quoridor (using my personal laptop), it still can't beat me, but it doesn't seem anywhere close to plateauing. It goes from the agents aimlessly wandering around the board looking for victory row and randomly placing walls, to putting walls that thwart the opponent and navigating to the victory row.

I decided to implement PCR and try out some self-play techniques on Connect Four before I give it another go for Quoridor; a few days of self-play improvements can speedup training 10x. That's where I'm at now...

Once I test a few strategies I was thinking of firing up a c5a24x, 96-core box on AWS and giving it another go. It's ~1-2$/hr at the spot price so I can probably do a lot of damage for 50$ or so.