|
|
|
|
|
by ionforce
2877 days ago
|
|
Maybe this is a limitation of self-play. If the opponent an AI faces during training is always optimal, then there's no surface area of mistakes. The losing AI, in its model/mind, knows that the game is over after a specific threshold. So it hasn't learned how to optimize for capitalizing on mistakes. I wonder if this situation can be fixed by adding more randomness. For example, force AI'1 to be in a losing position to AI'2, but then suddenly switch the power level of AI'2 to be much weaker (where mistakes happen) so that AI'1 learns how to fight its way out of tough situations. |
|