Hacker News new | ask | show | jobs
by datastoat 2228 days ago
I would pick a value of R that shows itself to have good predictive accuracy.

The way to test predictive models is always to look for their predictive accuracy on holdout data. Machine learning has this ingrained. Classic statistics does this too -- AIC is used to compare models, and it's (asymptotically) leave-one-out cross validation [1].

There's nothing intrinsically wrong with models that have millions of parameters; they might overfit in which case they will have poor predictive accuracy on holdout data, or they might predict well.

I agree with the original article that software engineer scrutiny isn't appropriate for this sort of code -- but I would argue instead that it needs a general-purpose statistician or data scientist or ML expert to evaluate its predictive accuracy. You can't possibly figure this out from a simulator codebase.

At the time the model was published, and acted on by the UK government, there was very little data on which to test predictive accuracy. That's fine -- all it means is that the predictions should have been presented with gigantic confidence intervals.

[1] http://www.stats.ox.ac.uk/~ripley/Nelder80.pdf

1 comments

The model isn't predictive though - it's a simulator. If we'd waited until we had enough data to make predictions with it (which I doubt you could given the sheer number of parameters) it'd be too late to use any of the interventions.

How would you ethically collect training data for the interventions?

The outputs of the model _were_ being treated as predictions.

The Ferguson paper from 16 March used the language of prediction: "In the (unlikely) absence of any control measures [...] given an estimated R0 of 2.4, we predict 81% of the GB and US populations would be infected over the course of the epidemic." [1]. The news coverage also used that language: "Imperial researchers model likely impact of public health measures" [2]. And look at the rest of the comments in this discussion, and count how many types "predict" appears!

> If we'd waited until we had enough data to make predictions with it

This is like the drunk looking for their keys under a streetlight. "Did you lose the keys here?" "No, but the light is much better here." -- "How confident are you in your model's predictions?" "I have no idea, but it's the model I have."

Also -- the Ferguson model made predictions, based on the parameters they picked. You don't need to wait for data to make predictions; you only need data to validate your predictions.

> How would you ethically collect training data for the interventions?

You don't. You (as a scientist who influences public policy) should publish validated confidence intervals for your predictions. You (as a government) should understand that there is a huge margin of uncertainty in the predictions, and accept that sometimes you just have to make decisions in the absence of knowledge. You (both the scientist and the government) do not go around spouting "Our decisions are led by science".

[1] https://spiral.imperial.ac.uk:8443/bitstream/10044/1/77482/1...

[2] https://www.imperial.ac.uk/news/196234/covid19-imperial-rese...

How do you validate the predictions for the number of infected cases in May for scenarios that don't happen?