Hacker News new | ask | show | jobs
by ionforce 2877 days ago
Maybe this is a limitation of self-play. If the opponent an AI faces during training is always optimal, then there's no surface area of mistakes. The losing AI, in its model/mind, knows that the game is over after a specific threshold. So it hasn't learned how to optimize for capitalizing on mistakes.

I wonder if this situation can be fixed by adding more randomness. For example, force AI'1 to be in a losing position to AI'2, but then suddenly switch the power level of AI'2 to be much weaker (where mistakes happen) so that AI'1 learns how to fight its way out of tough situations.

1 comments

One of the most interesting takeaways from the post game interview for me was that the AI can be very stupid if you just blindly throw it in a self-play setting but with clever use of randomization (modifying power levels) and action restrictions (for example, only allowing the agent to spend an anti-invis item when a nearby enemy goes out of sight) it is possible to provide better learning opportunities for the AI.
> for example, only allowing anti-invis items when an enemy goes out of sight

These are the kind of actions you specifically don't want to code in because you're throwing in human knowledge. You want the AI to learn by itself that using anti-invis when everyone is visible is a low-value move.

The purist in me was even mad that they had a hand-crafted evaluation function. (e.g. prefer gold, prefer taking towers, each given some arbitrary value)