| https://github.com/DevJac/solve_the_spire I stretched the truth a bit, I'm actually doing something like "hierarchical model-free reinforcement learning", even so, figuring out how to break the game down to create a hierarchy of agents is a lot of work. Basically, the AI is composed of about 8 different traditional RL agents (neural networks), each deciding a different thing. One chooses which cards to draft, one chooses which actions to take in combat, one chooses which path to take on the map, etc. Simple rules like "play random cards until your energy is used up" alone can sometimes beat the act 1 boss. My AI is barely above that, and still far from solving the game. I'm not convinced even DeepMind or other researchers could solve Slay the Spire right now. It shows definite signs of improvement, but has only reached a point where it can beat the act 1 boss about 50% of the time. I think that is its limit right now. I'm doing policy gradient which is very sample inefficient. I'm going to implement soft-actor-critic and see if it can do better with better sample efficiency. One thing I like about Slay the Spire is it's an environment to solve, not a competition. Gamers like to talk about PvP and PvE, well, I prefer AI vs environment over AI vs AI. In the end, an AI will win the competition, no surprise. An AI solving a new kind of environment is much more exciting IMHO. |
For example, when deciding what cards to play you often need to take into account what is coming up next on the map; it is not sufficient to consider only how to win the current fight. Relics such as incense burner carry over their turn counters between fights and so it's a strong strategy to delay the end of the current fight in order to set up an optimal incense burner number for the next fight. What number that counter should be is highly dependent on which enemies/elites/bosses you'll be facing in the next fight.
An expert system would have a database of every opponent in the game and when they are likely/guaranteed to appear and then seek to optimize the various conditions at the end of the current fight so that the next fight goes as smoothly as possible. I don't see how this could be accomplished with separate agents each attempting to play a different component of the game in isolation.