|
|
|
|
|
by highd
3268 days ago
|
|
It seems like a more comparable reinforcement learning thing to do would be to combine the entropy criterion with a known reward when available in some way and then do Q learning on that without the simulation requirement. Then in cases where reward is uncertain or infrequent you fall back to a flexibility heuristic. |
|