|
|
|
|
|
by MasterScrat
2368 days ago
|
|
If you are not familiar with RL, I recommend first reading the two articles that the author links to: - https://www.alexirpan.com/2018/02/14/rl-hard.html - https://himanshusahni.github.io/2018/02/23/reinforcement-lea... They are no so recent anymore, but still capture the problem well. Long story short: RL doesn't work yet. We're not sure it'll ever work. Some big companies are betting that it will. > My own hypothesis is that the reward function for learning organisms is really driven from maintaining homeostasis and minimizing surprise. Both directions are actively researched: maximizing surprise (to improve exploration), and minimizing surprise (to improve exploitation). See eg "Exploration by Random Network Distillation" for the first, "SURPRISE MINIMIZING RL IN DYNAMIC ENVIRONMENTS" for the second. |
|