| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by moosinho 2830 days ago

How about an approach where the agent's reward is not the predictability itself but the first derivative of it. This way the agent will be attracted to the parts of environment where it can improve and will avoid white-noise parts since its model of the world doesn't generalize on these.

Juergen Schmidhuber (the author of original LSTM paper) had a very similar idea, http://people.idsia.ch/~juergen/driven2009.pdf

"This drive maximizes interestingness, the first derivative of subjective beauty or compressibility, that is, the steepness of the learning curve. It motivates exploring infants, pure mathematicians, composers, artists, dancers, comedians, yourself, and (since 1990) artificial systems."