| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dragontamer 2823 days ago

DOTA is a bad example.

Poker is a better example, because Nash-Equalibrium estimating algorithms have begun to perform better than humans in the past year or two.

Pokemon, like Poker, is a game of bluffing and partial information. I expect Pokemon's optimal strategy to be the same mix of fold (aka: switch your Pokemon out to a defensive Pokemon... eating an attack but minimizing the opponent's damage to your team), and bluff (stay in, maybe use a move that exactly counters your opponent's choice. Ex: An unrevealed Choice Scarf Draco Meteor, surprising the opponent that your pokemon is faster than the opponent expected).

1 comments

Bartweiss 2823 days ago

The poker analogy seems like the right one to use, although Pokemon is made messier by the level of variance. (Meaning both "semi-random effects" and also "far more than 52 possibilities for mon and moves".) I'd imagine the completely-hidden playstyles would be incredibly hard for an AI to learn, but the popular Showdown style that has team preview might be workable. The poker analogy seems like a good one, at least for studying the sorts of things an agent would need to do.

There's definitely a recognizable 'tempo' to pokemon, where A picks a move that threatens B, B switches to something that can take it and threaten back, then A in turn switches to take the hit and threaten back. Which, much like just accurately betting your hand strength in poker, is enough to beat a lot of amateurs. The metaphor goes from there - though I might use 'raise' for leaving a threatened pokemon exposed, which lets us differentiate a strong hand ("I'll use a coverage move with higher speed") from a bluff ("I can hit his switch if I call it.") As an example, opening Koko v Landorus. The fold is switching Koko to Skarmory, the honest raise is HP Ice, and the bluff is Thunderbolt.

The basic ebb and flow of the game seems like it's that and one more layer - double switches and attempts to predict them. Above that, there's just not enough probability mass left to benefit from trying to triple switch, counter-counter-switch, and so on.

Of course, it's all made vastly more complicated by trying to trap, set hazards or status, and make space for setup moves. I'm not sure what it would take to get an unsupervised learner to value e.g. Rocks appropriately. My experience has been that neural nets struggle badly on assessing that sort of long term state change, though of course I'm not working at OpenAI or DeepMind levels.

link

dragontamer 2823 days ago

> The metaphor goes from there - though I might use 'raise' for leaving a threatened pokemon exposed, which lets us differentiate a strong hand ("I'll use a coverage move with higher speed") from a bluff ("I can hit his switch if I call it.") As an example, opening Koko v Landorus. The fold is switching Koko to Skarmory, the honest raise is HP Ice, and the bluff is Thunderbolt.

I'd argue that the raise is U-Turn :-). Which instant-wins any switching contest (ex: U-Turn on the switch, leaving the option to switch into Magnezone to trap the Skarmory, or if Lando stays in you can switch to your dedicated Lando counter... not that Lando really has a solid counter mind-you, but you get the idea.).

The U-Turn war however, between Lando and Koko demonstrates the bluffing game once again. Koko staying in and doing something weird like Calm Mind, or even Reflect/Light Screen would be absurd, but it would definitely beat the Lando U-Turn in most cases.

link

Bartweiss 2822 days ago

> Koko staying in and doing something

Heh, good example. I keep running into defog Koko, I think precisely for this reason. In raw number terms it's not a great use of a Koko or a moveslot, but Koko forces so many U-Turns or outright switches that it's a strong way to gain momentum. And if Lan-T just switched out to avoid HP Ice, the check might not be ground, opening the door to Volt Switch away for even more momentum. Taking a time-biding move for specific switches is a pretty great example of this back-and-forth pattern.

(Although - I'm not sure Lan can/does U-Turn on Koko? If it's scarfed it can lead with Earthquake for a kill, if it isn't it'll drop to HP Ice before the turn.)

link

dragontamer 2821 days ago

It really depends on what I'm predicting. U-Turn on Lando wins a surprising number of options:

* Beats Koko Volt-Switch: Lando is immune, so Koko fails to switchout.

* Beats the Koko Uturn: Lando is slower, as the 2nd U-Turner you capture the switching momentum.

* Beats the Koko Thunderbolt: Its prediction-on-top-of-predictions going on here, but this happens sometimes.

* Beats the Koko Hard-Switch: Hey, maybe they thought your Lando was scarf'd so they hard switch out.

--------

* Loses to HP-ice: This is the "obvious move" for Koko to do, and will happen more often than not. But as you go up the ranks, people start going for 2nd tier or 3rd tier mind-games, and you see fewer and fewer "obvious moves", especially in the early game where momentum is such a big deal.

It really depends where you are on the ladder: how stupid or aggressive you think your opponent is and all that.

link