| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by MasterScrat 2368 days ago

If you are not familiar with RL, I recommend first reading the two articles that the author links to:

They are no so recent anymore, but still capture the problem well.

Long story short: RL doesn't work yet. We're not sure it'll ever work. Some big companies are betting that it will.

> My own hypothesis is that the reward function for learning organisms is really driven from maintaining homeostasis and minimizing surprise.

Both directions are actively researched: maximizing surprise (to improve exploration), and minimizing surprise (to improve exploitation).

See eg "Exploration by Random Network Distillation" for the first, "SURPRISE MINIMIZING RL IN DYNAMIC ENVIRONMENTS" for the second.