Hacker News new | ask | show | jobs
by jetnew 1303 days ago
If we train recursively-restricted reinforcement learning agents, could there be interesting differences in the behaviors that emerge? Could it even be used as a method for exploration?

Some set-up considerations: 1) Actions must be discrete, or at least binned for restriction, 2) The number of times to restrict is limited by the size of the action space

I would imagine for CartPole, the balancing would become more wobbly, while still somewhat successfully balancing. But in more complicated environments, it could result in much more different behaviors because the states visited (and trajectories) could be different.