| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by o09rdk 2471 days ago

The article is interesting, but only touches on the very tip of a very large iceberg, and your comments are spot on.

This issue has been a central one in psychology for decades, especially in clinical psychology. There are many variants of the problem, with lots of corollary problems.

Once central challenge is that even when you say you are interested in inferences about an individual, you usually actually evaluate your inferential strategy across individuals. This becomes problematic because it's easy to identify a serial killer post hoc; it's harder to avoid the inevitable avalanche of false positives if you apply this to thousands or tends of thousands of individuals regularly. In this way, you're not actually interested in one single individual; it becomes critical to be really clear about what your inferential population is, and what you're actually trying to generalize to.

Like you're pointing out, a lot of this too reduces to the significant challenge of knowing which predictive model to use, when you're effectively faced with thousands, if not an infinite range of models (one for each person/situation) to choose from. You might improve your prediction by using a more tailored model for an individual, but then you increase the risk of model selection error. Even if you have a lot of data on a person, they will probably change, circumstances will change, and so forth. The challenge is in knowing how to decide which model to use and when, when to decide that a particular predictive case is an exception.

This is sometimes known as the "broken leg" problem in clinical psych, so called because of a thought experiment in which you're tasked with predicting commuting behavior. You might have volumes of data on people, even individual people, but if you know that someone suddenly breaks their leg, or there is some other anomalous event, it compels you to logically alter your prediction. The trouble arises in knowing when to shift your prediction, when a scenario is different enough from that under which your default predictive model was developed. In the age of big data, you might be able to capture an increasing variety of scenarios, but there will always be cases that are not sufficiently captured, or where logic impels one to switch predictive strategies. But how do you do that?

I personally think what's undervalued is a focus on predictive uncertainty rather than the point value of the prediction. That is, realistically capturing and recognizing the true uncertainty in prediction for individuals, rather than improving the accuracy in prediction itself, which might actually have some fundamental limit.