|
|
|
|
|
by frereubu
2155 days ago
|
|
Although I think I get the underlying feeling you're trying to define, I'm not sure about the analogy of the experiment on mice. That feels more like it's about being in the presence of an unreliable food source, where it makes sense to stock up while food is available to get you through the lean times. There's obviously an overcompensation there, but evolutionarily-derived mechanisms aren't particularly precise. The food source in this experiment is "reliably unreliable", which probably doesn't mimic real-life food sources very well. |
|
I like to model this in terms of multi-armed bandits. Given a set of levers giving out unknown rewards, what is the optimal policy to maximize your rewards over time? A bad way would be to try all the levers until one gives you a reward, and then just keep pulling that one lever in hopes of more. This doesn't work because the other levers might have given you even better rewards.
Instead, you should try to learn to predict how much reward each lever is going to give you. A fast way to do this is to focus on pulling levers that "surprise" you, i.e. where your predictions of reward deviate from the actual reward you got. This works as long as the reward from the environment is at least in principle predictable, as it mostly is in nature. But with truly random rewards, you tend to end up with addictive behaviour. In nature this isn't a big problem, because truly random rewards are generally one-off events. So we're basically exploiting a bug in our own reward mechanisms, and evolution hasn't had time to adapt.
Incidentally, all of this can be viewed as a mathematically and psychologically precise way of saying that the reason you get addicted to news is because you are curious.