Hacker News new | ask | show | jobs
by rytill 3041 days ago
The AlphaZero algorithm (monte carlo tree search with value estimator trained by reinforcement learning) works on any environment you can simulate during play time, single player or not.
1 comments

Any environment with finite action and state-spaces.
No, the key requirement which makes it difficult to use on real-world tasks is that you must be able to do a forward rollout of your environment in your decision-making process.