| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gabrielgoh 3268 days ago
	i agree completely, and that what's happening is nothing more than brute force search. Though I do think this is still interesting as the reward here is potentially much more well-conditioned than the rewards in RL. Having said that there are situations where this will fail completely, e.g. in maze solving, where the goal is not to play to keep playing but to play to reach the end.

1 comments

highd 3268 days ago

It seems like a more comparable reinforcement learning thing to do would be to combine the entropy criterion with a known reward when available in some way and then do Q learning on that without the simulation requirement. Then in cases where reward is uncertain or infrequent you fall back to a flexibility heuristic.

link

robertsdionne 3268 days ago

Maybe like https://pathak22.github.io/noreward-rl/

link