Whoa, so cool!!
You know what would be even cooler? if you could have it play any game described by the Game Description Language [1]. it looks like the project is most of the way there, since the environment methods looks like calls to data that would be included in a GDL description.
At one time there was a Stanford course on the subject of General Game Players. The final project was to submit a player and see how it performed on a small set of games described by GDL. These were variations on Backgammon, Othello, and similar, with some rule changes.
While much of the course material survives [1], those rulesets do not. The only GDL example I could find was the somewhat trivial example of Tic Tac Toe, see section 2.6 Tic Tac Toe Game Rules here [2].
It is research code (read very unpolished) but you can get inspiration from pyggp [1]. More specifically, the game_description_language module. I implemented pyggp for my masters thesis. It's a proof of concept and will be iterated upon.
I learned about GDL several years ago from folk working on General Game Player AIs, that is AI that could play any well described game. They were working primarily in Java at the time. A casual search show that, shockingly, there is no python library available for GDL (yet), they are still using Java.
Noob here. How is this different than reinforcement learning libraries like:
OpenAI’s Gym
TensorFlow’s TF-Agents
ReAgent by Meta
DeepMind’s OpenSpiel
Amazon SageMaker RL
If this is a faithful reimplementation of the AlphaZero algorithm (and I haven't looked through the code to confirm whether or not it is) then you'd expect equal performance to the published results after enough iterations of training. But the author probably doesn't have the resources to train agents on the same scale as Google did, and so performance in your own usage would largely come down to how long you can afford to train fr.
> get_legal_actions(): returns a list of legal actions
What's the expectation around your actions? It's not just 0..n for current actions with any arbitrary ordering, right? There needs to be some consistency between steps for training.
The policy function outputs the probability of taking every possible (legal or illegal) action. Once you have a way of indexing those actions, both the policy and the game need to refer to the same thing when indexing the same number
How would this handle games with random or incomplete information? Such as UNO, craps, etc. (I'd love to see what this thing does with a known losing game, just as a validation.)
AlphaZero has been made for perfect information games. That said, the Monte Carlo Tree Search in the library can be run with any agent that implements a value and policy function. So, while the AlphaZeroAgent in agents.py wouldn't fit the problem you are describing, implementing something like Meta's ReBeL (https://ai.meta.com/blog/rebel-a-general-game-playing-ai-bot...) shouldn't be an impossible task. The Monte Carlo Tree Search algorithm in mcts.py has been written to be modular from the start exactly to do something like this!
You'll also likely want to mention the "needs python >= 3.8" in the readme https://github.com/s-casci/tinyzero/blob/244a263976cd9a09f5f... OT1H, I would hope folks are keeping their pythons current, but OTOH dev environments are gonna dev environment