Hacker News new | ask | show | jobs
by viraptor 913 days ago
I think this glances over don't details here:

> get_legal_actions(): returns a list of legal actions

What's the expectation around your actions? It's not just 0..n for current actions with any arbitrary ordering, right? There needs to be some consistency between steps for training.

1 comments

The policy function outputs the probability of taking every possible (legal or illegal) action. Once you have a way of indexing those actions, both the policy and the game need to refer to the same thing when indexing the same number