Hacker News new | ask | show | jobs
by MasterScrat 2368 days ago
If you are not familiar with RL, I recommend first reading the two articles that the author links to:

- https://www.alexirpan.com/2018/02/14/rl-hard.html

- https://himanshusahni.github.io/2018/02/23/reinforcement-lea...

They are no so recent anymore, but still capture the problem well.

Long story short: RL doesn't work yet. We're not sure it'll ever work. Some big companies are betting that it will.

> My own hypothesis is that the reward function for learning organisms is really driven from maintaining homeostasis and minimizing surprise.

Both directions are actively researched: maximizing surprise (to improve exploration), and minimizing surprise (to improve exploitation).

See eg "Exploration by Random Network Distillation" for the first, "SURPRISE MINIMIZING RL IN DYNAMIC ENVIRONMENTS" for the second.