Show HN: Easily train AlphaZero-like agents on any environment you want | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Show HN: Easily train AlphaZero-like agents on any environment you want (github.com)
	87 points by s-casci 911 days ago

7 comments

mdaniel 911 days ago

This repo and the code files appear to be missing any licensing details

You'll also likely want to mention the "needs python >= 3.8" in the readme https://github.com/s-casci/tinyzero/blob/244a263976cd9a09f5f... OT1H, I would hope folks are keeping their pythons current, but OTOH dev environments are gonna dev environment

s-casci 911 days ago

Good catches, I've added the missing information. Thanks

JoeDaDude 911 days ago

Whoa, so cool!! You know what would be even cooler? if you could have it play any game described by the Game Description Language [1]. it looks like the project is most of the way there, since the environment methods looks like calls to data that would be included in a GDL description.

[1]. https://en.wikipedia.org/wiki/Game_Description_Language

vldmrs 911 days ago

Are there any sample game descriptions for some games ? I have checked all the links but couldn’t find a single example.

JoeDaDude 911 days ago

At one time there was a Stanford course on the subject of General Game Players. The final project was to submit a player and see how it performed on a small set of games described by GDL. These were variations on Backgammon, Othello, and similar, with some rule changes.

While much of the course material survives [1], those rulesets do not. The only GDL example I could find was the somewhat trivial example of Tic Tac Toe, see section 2.6 Tic Tac Toe Game Rules here [2].

[1]. http://ggp.stanford.edu/public/lessons.php

[2]. http://ggp.stanford.edu/chapters/chapter_02.html

An email for Michael Genesereth, teacher of the course, is on the course website. I might shoot an e-mail and see if he has GDL files to share.

s-casci 911 days ago

Interesting, I didn't know about it... Modifying the existing environments' interfaces shouldn't be too difficult. Feel free to submit a PR!

ZiggerZZ 911 days ago

Didn't know about this formalism! Are there any Python libraries that support GDL?

Entze 911 days ago

It is research code (read very unpolished) but you can get inspiration from pyggp [1]. More specifically, the game_description_language module. I implemented pyggp for my masters thesis. It's a proof of concept and will be iterated upon.

[1] github.com/Entze/pyggp

JoeDaDude 911 days ago

I learned about GDL several years ago from folk working on General Game Player AIs, that is AI that could play any well described game. They were working primarily in Java at the time. A casual search show that, shockingly, there is no python library available for GDL (yet), they are still using Java.

http://www.ggp.org/

https://github.com/ggp-org/

vermaat 911 days ago

Noob here. How is this different than reinforcement learning libraries like: OpenAI’s Gym TensorFlow’s TF-Agents ReAgent by Meta DeepMind’s OpenSpiel Amazon SageMaker RL

s-casci 911 days ago

There certainly are other projects around AlphaZero, I'd say this is simpler and much more basic

jasonjmcghee 911 days ago

Fwiw openai no longer develops or maintains “gym”, which might dissuade some folks investing too deeply into it.

I haven’t used it in a few years but certainly was the standard back then

tomatovole 911 days ago

Do you have evaluations for how well the trained agents do (e.g. for chess, go, etc)?

Reubend 911 days ago

If this is a faithful reimplementation of the AlphaZero algorithm (and I haven't looked through the code to confirm whether or not it is) then you'd expect equal performance to the published results after enough iterations of training. But the author probably doesn't have the resources to train agents on the same scale as Google did, and so performance in your own usage would largely come down to how long you can afford to train fr.

viraptor 911 days ago

I think this glances over don't details here:

> get_legal_actions(): returns a list of legal actions

What's the expectation around your actions? It's not just 0..n for current actions with any arbitrary ordering, right? There needs to be some consistency between steps for training.

s-casci 911 days ago

The policy function outputs the probability of taking every possible (legal or illegal) action. Once you have a way of indexing those actions, both the policy and the game need to refer to the same thing when indexing the same number

Y_Y 911 days ago

Maybe I'll finally be able to train a worthy opponent for Carcassonne!

s-casci 911 days ago

If you do that, please submit a PR!

ilc 911 days ago

How would this handle games with random or incomplete information? Such as UNO, craps, etc. (I'd love to see what this thing does with a known losing game, just as a validation.)

gwern 911 days ago

The standard AlphaZero doesn't handle that. For that you'd need to graduate to more complex variants like the aforementioned ReBeL, AlphaZe* https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10213697/ or BetaZero https://arxiv.org/abs/2306.00249 or ExIt-OOS https://arxiv.org/abs/1808.10120 or Player of Games https://arxiv.org/abs/2112.03178#deepmind .

(You could also move straight to MuZero variations: https://arxiv.org/abs/2106.04615#deepmind https://openreview.net/forum?id=X6D9bAHhBQ1#deepmind https://openreview.net/forum?id=QnzSSoqmAvB )

s-casci 911 days ago

AlphaZero has been made for perfect information games. That said, the Monte Carlo Tree Search in the library can be run with any agent that implements a value and policy function. So, while the AlphaZeroAgent in agents.py wouldn't fit the problem you are describing, implementing something like Meta's ReBeL (https://ai.meta.com/blog/rebel-a-general-game-playing-ai-bot...) shouldn't be an impossible task. The Monte Carlo Tree Search algorithm in mcts.py has been written to be modular from the start exactly to do something like this!