|
|
|
|
|
by dragontamer
2823 days ago
|
|
DOTA is a bad example. Poker is a better example, because Nash-Equalibrium estimating algorithms have begun to perform better than humans in the past year or two. Pokemon, like Poker, is a game of bluffing and partial information. I expect Pokemon's optimal strategy to be the same mix of fold (aka: switch your Pokemon out to a defensive Pokemon... eating an attack but minimizing the opponent's damage to your team), and bluff (stay in, maybe use a move that exactly counters your opponent's choice. Ex: An unrevealed Choice Scarf Draco Meteor, surprising the opponent that your pokemon is faster than the opponent expected). |
|
There's definitely a recognizable 'tempo' to pokemon, where A picks a move that threatens B, B switches to something that can take it and threaten back, then A in turn switches to take the hit and threaten back. Which, much like just accurately betting your hand strength in poker, is enough to beat a lot of amateurs. The metaphor goes from there - though I might use 'raise' for leaving a threatened pokemon exposed, which lets us differentiate a strong hand ("I'll use a coverage move with higher speed") from a bluff ("I can hit his switch if I call it.") As an example, opening Koko v Landorus. The fold is switching Koko to Skarmory, the honest raise is HP Ice, and the bluff is Thunderbolt.
The basic ebb and flow of the game seems like it's that and one more layer - double switches and attempts to predict them. Above that, there's just not enough probability mass left to benefit from trying to triple switch, counter-counter-switch, and so on.
Of course, it's all made vastly more complicated by trying to trap, set hazards or status, and make space for setup moves. I'm not sure what it would take to get an unsupervised learner to value e.g. Rocks appropriately. My experience has been that neural nets struggle badly on assessing that sort of long term state change, though of course I'm not working at OpenAI or DeepMind levels.