Hacker News new | ask | show | jobs
Show HN: Easily train AlphaZero-like agents on any environment you want (github.com)
87 points by s-casci 911 days ago
7 comments

This repo and the code files appear to be missing any licensing details

You'll also likely want to mention the "needs python >= 3.8" in the readme https://github.com/s-casci/tinyzero/blob/244a263976cd9a09f5f... OT1H, I would hope folks are keeping their pythons current, but OTOH dev environments are gonna dev environment

Good catches, I've added the missing information. Thanks
Whoa, so cool!! You know what would be even cooler? if you could have it play any game described by the Game Description Language [1]. it looks like the project is most of the way there, since the environment methods looks like calls to data that would be included in a GDL description.

[1]. https://en.wikipedia.org/wiki/Game_Description_Language

Are there any sample game descriptions for some games ? I have checked all the links but couldn’t find a single example.
At one time there was a Stanford course on the subject of General Game Players. The final project was to submit a player and see how it performed on a small set of games described by GDL. These were variations on Backgammon, Othello, and similar, with some rule changes.

While much of the course material survives [1], those rulesets do not. The only GDL example I could find was the somewhat trivial example of Tic Tac Toe, see section 2.6 Tic Tac Toe Game Rules here [2].

[1]. http://ggp.stanford.edu/public/lessons.php

[2]. http://ggp.stanford.edu/chapters/chapter_02.html

An email for Michael Genesereth, teacher of the course, is on the course website. I might shoot an e-mail and see if he has GDL files to share.

Interesting, I didn't know about it... Modifying the existing environments' interfaces shouldn't be too difficult. Feel free to submit a PR!
Didn't know about this formalism! Are there any Python libraries that support GDL?
It is research code (read very unpolished) but you can get inspiration from pyggp [1]. More specifically, the game_description_language module. I implemented pyggp for my masters thesis. It's a proof of concept and will be iterated upon.

[1] github.com/Entze/pyggp

I learned about GDL several years ago from folk working on General Game Player AIs, that is AI that could play any well described game. They were working primarily in Java at the time. A casual search show that, shockingly, there is no python library available for GDL (yet), they are still using Java.

http://www.ggp.org/

https://github.com/ggp-org/

Noob here. How is this different than reinforcement learning libraries like: OpenAI’s Gym TensorFlow’s TF-Agents ReAgent by Meta DeepMind’s OpenSpiel Amazon SageMaker RL
There certainly are other projects around AlphaZero, I'd say this is simpler and much more basic
Fwiw openai no longer develops or maintains “gym”, which might dissuade some folks investing too deeply into it.

I haven’t used it in a few years but certainly was the standard back then

Do you have evaluations for how well the trained agents do (e.g. for chess, go, etc)?
If this is a faithful reimplementation of the AlphaZero algorithm (and I haven't looked through the code to confirm whether or not it is) then you'd expect equal performance to the published results after enough iterations of training. But the author probably doesn't have the resources to train agents on the same scale as Google did, and so performance in your own usage would largely come down to how long you can afford to train fr.
I think this glances over don't details here:

> get_legal_actions(): returns a list of legal actions

What's the expectation around your actions? It's not just 0..n for current actions with any arbitrary ordering, right? There needs to be some consistency between steps for training.

The policy function outputs the probability of taking every possible (legal or illegal) action. Once you have a way of indexing those actions, both the policy and the game need to refer to the same thing when indexing the same number
Maybe I'll finally be able to train a worthy opponent for Carcassonne!
If you do that, please submit a PR!
How would this handle games with random or incomplete information? Such as UNO, craps, etc. (I'd love to see what this thing does with a known losing game, just as a validation.)
The standard AlphaZero doesn't handle that. For that you'd need to graduate to more complex variants like the aforementioned ReBeL, AlphaZe* https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10213697/ or BetaZero https://arxiv.org/abs/2306.00249 or ExIt-OOS https://arxiv.org/abs/1808.10120 or Player of Games https://arxiv.org/abs/2112.03178#deepmind .

(You could also move straight to MuZero variations: https://arxiv.org/abs/2106.04615#deepmind https://openreview.net/forum?id=X6D9bAHhBQ1#deepmind https://openreview.net/forum?id=QnzSSoqmAvB )

AlphaZero has been made for perfect information games. That said, the Monte Carlo Tree Search in the library can be run with any agent that implements a value and policy function. So, while the AlphaZeroAgent in agents.py wouldn't fit the problem you are describing, implementing something like Meta's ReBeL (https://ai.meta.com/blog/rebel-a-general-game-playing-ai-bot...) shouldn't be an impossible task. The Monte Carlo Tree Search algorithm in mcts.py has been written to be modular from the start exactly to do something like this!