| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by highd 3268 days ago
	It seems like a more comparable reinforcement learning thing to do would be to combine the entropy criterion with a known reward when available in some way and then do Q learning on that without the simulation requirement. Then in cases where reward is uncertain or infrequent you fall back to a flexibility heuristic.

1 comments