Hacker News new | ask | show | jobs
by nbeleski 2813 days ago
Given the results obtained by the OpenAI in Dota[1] (with asymmetrical teams nonetheless) I am pretty confident RL could be used to train a pretty efficient pokemon pvp agent. From my experiences the nuances and mindgames/predictions in a pokemon battle are much simpler than those in a high level chess/go game.

I would say the model isn't as straightforward as the Mario or Sonic AI players, but is still achievable. Actually, I wish I had more time because this is definetly a project I would like to tackle.

[1] https://blog.openai.com/openai-five/

3 comments

What experience do you have? Because this seems opposite my expectations; I doubt standard techniques will do all that well. The only thing harder about chess seems to be that people take it more seriously, so the average skill of the playerbase is higher.
DOTA is a bad example.

Poker is a better example, because Nash-Equalibrium estimating algorithms have begun to perform better than humans in the past year or two.

Pokemon, like Poker, is a game of bluffing and partial information. I expect Pokemon's optimal strategy to be the same mix of fold (aka: switch your Pokemon out to a defensive Pokemon... eating an attack but minimizing the opponent's damage to your team), and bluff (stay in, maybe use a move that exactly counters your opponent's choice. Ex: An unrevealed Choice Scarf Draco Meteor, surprising the opponent that your pokemon is faster than the opponent expected).

The poker analogy seems like the right one to use, although Pokemon is made messier by the level of variance. (Meaning both "semi-random effects" and also "far more than 52 possibilities for mon and moves".) I'd imagine the completely-hidden playstyles would be incredibly hard for an AI to learn, but the popular Showdown style that has team preview might be workable. The poker analogy seems like a good one, at least for studying the sorts of things an agent would need to do.

There's definitely a recognizable 'tempo' to pokemon, where A picks a move that threatens B, B switches to something that can take it and threaten back, then A in turn switches to take the hit and threaten back. Which, much like just accurately betting your hand strength in poker, is enough to beat a lot of amateurs. The metaphor goes from there - though I might use 'raise' for leaving a threatened pokemon exposed, which lets us differentiate a strong hand ("I'll use a coverage move with higher speed") from a bluff ("I can hit his switch if I call it.") As an example, opening Koko v Landorus. The fold is switching Koko to Skarmory, the honest raise is HP Ice, and the bluff is Thunderbolt.

The basic ebb and flow of the game seems like it's that and one more layer - double switches and attempts to predict them. Above that, there's just not enough probability mass left to benefit from trying to triple switch, counter-counter-switch, and so on.

Of course, it's all made vastly more complicated by trying to trap, set hazards or status, and make space for setup moves. I'm not sure what it would take to get an unsupervised learner to value e.g. Rocks appropriately. My experience has been that neural nets struggle badly on assessing that sort of long term state change, though of course I'm not working at OpenAI or DeepMind levels.

> The metaphor goes from there - though I might use 'raise' for leaving a threatened pokemon exposed, which lets us differentiate a strong hand ("I'll use a coverage move with higher speed") from a bluff ("I can hit his switch if I call it.") As an example, opening Koko v Landorus. The fold is switching Koko to Skarmory, the honest raise is HP Ice, and the bluff is Thunderbolt.

I'd argue that the raise is U-Turn :-). Which instant-wins any switching contest (ex: U-Turn on the switch, leaving the option to switch into Magnezone to trap the Skarmory, or if Lando stays in you can switch to your dedicated Lando counter... not that Lando really has a solid counter mind-you, but you get the idea.).

The U-Turn war however, between Lando and Koko demonstrates the bluffing game once again. Koko staying in and doing something weird like Calm Mind, or even Reflect/Light Screen would be absurd, but it would definitely beat the Lando U-Turn in most cases.

> Koko staying in and doing something

Heh, good example. I keep running into defog Koko, I think precisely for this reason. In raw number terms it's not a great use of a Koko or a moveslot, but Koko forces so many U-Turns or outright switches that it's a strong way to gain momentum. And if Lan-T just switched out to avoid HP Ice, the check might not be ground, opening the door to Volt Switch away for even more momentum. Taking a time-biding move for specific switches is a pretty great example of this back-and-forth pattern.

(Although - I'm not sure Lan can/does U-Turn on Koko? If it's scarfed it can lead with Earthquake for a kill, if it isn't it'll drop to HP Ice before the turn.)

It really depends on what I'm predicting. U-Turn on Lando wins a surprising number of options:

* Beats Koko Volt-Switch: Lando is immune, so Koko fails to switchout.

* Beats the Koko Uturn: Lando is slower, as the 2nd U-Turner you capture the switching momentum.

* Beats the Koko Thunderbolt: Its prediction-on-top-of-predictions going on here, but this happens sometimes.

* Beats the Koko Hard-Switch: Hey, maybe they thought your Lando was scarf'd so they hard switch out.

--------

* Loses to HP-ice: This is the "obvious move" for Koko to do, and will happen more often than not. But as you go up the ranks, people start going for 2nd tier or 3rd tier mind-games, and you see fewer and fewer "obvious moves", especially in the early game where momentum is such a big deal.

It really depends where you are on the ladder: how stupid or aggressive you think your opponent is and all that.

The comparison to chess and go seems strange to me, I wonder if you could elaborate?

Certainly chess has a mental component, players develop styles, study one another, and try to throw opponents off balance. But all of that happens as a layer on top of the need to actually make good moves over the board - a bishop and knight endgame simply has a correct answer. Go is less constrained, but it's still alternating turns in a deterministic, perfect information setting.

Pokemon, meanwhile, looks to me somewhere between DOTA and poker. It's nondeterministic on crits, paralysis, accuracy, and a great deal else. It's effectively nondiscrete, in the sense that there's lots of variance which has only a chance of mattering. And it's heavily hidden-information - defining features like moveset aren't revealed. Meanwhile, the OpenAI Dota restrictions are heavily centered on removing hidden information (invisibility, wards) and unexpected state changes (summons, quelling blade, infused raindrop).

I expect Pokemon would be more tractable on these issues because the hidden information is usually discrete. (Think "does he have Protect" as opposed to "is he standing invisible on this pixel?") But they're still major stumbling blocks, especially with randomness that massively expands the branching factor of each interaction. A given Pokemon move might look something like "if Ferrothorn uses Leech Seed, will it hit, and if so will he switch out, and if so will he go to Kartana, and if he does will it Swords Dance or does it have Choice Band or does it have Fightinium Z, or will he go to Koko, and if so does it have HP Fire or is it a bluff?" Everything there past "use this move" is laboring under a high branching factor with high randomness.

I don't think it'd be impossible to do fairly well on the Pokemon Showdown ladder with a medium amount of advance work; an AI can run a damage calculator and just assume every enemy has one of the recommended movesets from the wiki, and be assigned a viable team with relatively low variance and branching. But if you take away any of that hand curation, I expect things would go downhill pretty fast. And if you take it out of Showdown premades into a format where the enemy lineup isn't known in advance, I'd expect the now-intractable branching factor to lead to very poor performance with incredibly slow progress.

It'd be a damn interesting experiment, though.