Hacker News new | ask | show | jobs
by reinkaos 2279 days ago
Are you an epidemiologist? If so, can you be more specific about the problems and propose alternatives? Thanks.
1 comments

I am not...however i am married to one. The ranting recently is kind of fun to listen too. My partner is working on covid. I learn a lot, I'm no expert in epidemiology...but...I also do/teach statistics in a biology field. My degrees are is in mechanical engineering and education statistics. (And that's probably enough info to out me to any real life friends in HN ::waves::)

Basically the problem is the authors don't know enough about the data they are getting to run analysis on it. They are not clear, and absolutely need to be clear, on all things data for this paper to work.

Specific examples:

Are the websites they gather from reporting presumptive positives or confirmed positives?

Are they Getting information on when specific deaths occurred? Or are they using the latest update time, and the total number of deaths to date and then assuming a connection?

Are countries accurately reporting? Are they even capable of accurately reporting?

Are they testing enough (all) of the dead or are they relying on presumptive positives?

It's a data thing. Epidemiology is really really good at high quality data and analysis...but that takes time, more time than we have had to do good science during a crisis. It's why you don't do brain surgery in an ER. They are also really good about not over interpreting data...because data can be hinky.

One of the biggest things people in my undergraduate stats course cover with is specificity and sensitivity...false positives and negatives exist. That gets more complex when you consider that when talking about true positives versus false positives you have to have a gold standard test to reference against. So largely you are looking at one test versus another, even if one of those tests is really accurate, or pathology based, then things take time. That test doesn't typically produce a truly binary result but some chemical threshold that we treat as a binary. Decisions at each of those changes in data representation affect your outcomes. Then we talk about how it's impacted by prevalence and I try and teach them ROC curves and someone tells at me about how if you test positive for a disease you have that disease. (Seriously this happens about once a semester)

Where the students are really struggling isn't math, it's philosophical. The idea that a test result isn't objective truth is kinda bonkers the first time you encounter it. Especially for my engineers. I jokingly talk in class about how it's not a math class it's an estimation and bs detection class. I've seen this so much with covid...test results are not perfectly accurate. You can, and often should, bias them towards certain clinical goals. The first tests from cdc were problematic because they were overly sensitive (or contaminated...don't know yet, probably just too sensitive).

The reality more than anything is that there is so much miscommunication about covid, unavoidable communication, that I'm not sure anything is fully trust worthy yet. Weeks ago one of the first rapid studies published in JAMA ( or was it NEJM?) By a German group was submitted, reviewed, published, and withdrawn in about a week. People use words like 'positive test result' and can mean fundamentally different things when taking to each other and don't realize it.

And apologies I'm not following my normal anal hn comment style and citing this a bunch. Responding from my phone.

Thanks for the detailed reply. I agree with your comments regarding data quality, I assumed you had an issue with the methodology. I just took this paper to be a heuristic that could help inform decisions with the limited data we have so far, and not as a definitive answer.