| HN Mirror

You are of course free to model the underlying causality as you see fit, but the fact remains that this is one of the most robust and well-understood findings in all of psychology (see e.g. the Rescorla-Wagner model). The effect does not depend on the type of reward and exists in practically all intelligent animals, including humans. As noted, video game compulsion loops and gambling machines are carefully tuned to take maximal advantage of this "variable reward ratio".

I like to model this in terms of multi-armed bandits. Given a set of levers giving out unknown rewards, what is the optimal policy to maximize your rewards over time? A bad way would be to try all the levers until one gives you a reward, and then just keep pulling that one lever in hopes of more. This doesn't work because the other levers might have given you even better rewards.

Instead, you should try to learn to predict how much reward each lever is going to give you. A fast way to do this is to focus on pulling levers that "surprise" you, i.e. where your predictions of reward deviate from the actual reward you got. This works as long as the reward from the environment is at least in principle predictable, as it mostly is in nature. But with truly random rewards, you tend to end up with addictive behaviour. In nature this isn't a big problem, because truly random rewards are generally one-off events. So we're basically exploiting a bug in our own reward mechanisms, and evolution hasn't had time to adapt.

Incidentally, all of this can be viewed as a mathematically and psychologically precise way of saying that the reason you get addicted to news is because you are curious.