Hacker News new | ask | show | jobs
by s-casci 911 days ago
The policy function outputs the probability of taking every possible (legal or illegal) action. Once you have a way of indexing those actions, both the policy and the game need to refer to the same thing when indexing the same number