Hacker News new | ask | show | jobs
by cultus 2288 days ago
One fallacy that seems universal with healthcare folks is they think the false positive rate is the chance that a given positive result is erroneous. If an illness is rare, a positive result in a test with a 1% error rate might have an overwhelming probability of being a false positive. This is why prior probabilities need to be taken into account in making decisions.
5 comments

I appreciate all the answers that point out Bayes's theorem. One thing to add is that in contrast to the usual classroom Bayes the "occurrence in the general population" is very much a variable rather than a constant. For example, Germany's low death rate can be a side effect of the false positive issue in the paper.

In fact, the stark difference between Italy and Germany would provide support to the paper's conclusion.

There can be other explanation for this difference http://www.savespain.eu/italy-vs-germany because there is huge disparity between Italy and Germany infected population demographic.

My hypothesis is that this comes from bigger inter-generation connections in Italy.

Right, so it's a variable, but if we can independently nail down a false negative/positive rate, then we could try to infer the true rate of infection, but problem is (assuming US numbers):

* assume false positive rate of 10%, assume true positive rate of 100% (it's not but let's be generous)

* maybe 1/10 of the population (30 million) gets tested

* lets say 100,000 people have COVID right now (50x the official number)

number of positive tests = 30 million * 0.1 + 100k * 1 = 3,100,000

fraction of postive tests that actually have the disease = 1 / 31 = 3.2%

problem is FPR (1) depends on the population tested (i.e. p(covid | positive test) != p(covid|positive test, some symptoms) != ...), (2) we need very accurate measurements of FPR because:

lets say we constrain FPR to 10% +/- 1% ==> 10% uncertainty in FPR -- that means our inference of the number of infected people is:

n_infected = (n_positive_tests - FPR * n_tested)

which is: -200,000 to 400,000

so...not very useful.

This is just a form of the base rate fallacy, correct?

https://en.wikipedia.org/wiki/Base_rate_fallacy

Yes, it’s a special case of Bayes’ Theorem.
With all due respect, this comment shows a lack of understanding about how health professionals assess the quality of diagnostics.

My wife is a doctor (and I've learned a lot from her). In med school, they're specifically taught to evaluate diagnostics on their specificity and sensitivity - which essentially covers false positives and false negatives. If you hear a doctor talk about the "accuracy" of a test, it's likely because they're simplifying the concepts.

"Error rate" or "accuracy" is not used at the scientific level in medicine. Partly, for the reason you defined. It doesn't convey enough information about the outcome of the test.

A "99% accurate test" is pretty meaningless without understanding the specificity and/or sensitivity components. In fact, I've seen some headlines where they incorrectly refer to only one component as the "accuracy".

The specificity (true positive) and sensitivity (true negative) do not solve the problem I am describing.

If something is rare, it has a low base rate. That even means a test with excellent specificity and sensitivity could still be wrong most of the time.

Decisions on test accuracy simply cannot be made coherently when ignoring the base rate. To make an intuitive example, suppose that one in a thousand people have a disease. A test for the disease has 90% specificity and 100% sensitivity. It will always correctly give a positive result if the person has the disease, and has a 99% chance that a given positive test is valid. Pretty good, much better than most tests.

Now suppose that 1/1000 of people have the disease. A person with a positive result has a 1% chance of not having the disease. If everyone is tested, then 1/1000 people will get true positive results. But, (999/1000 * 0.01) ~ 1% of people will get false positives.

Thus, a given person with a positive result has nearly a 10x chance of it being erroneous compared to it being accurate! As I said, the frequentist techniques that you describe and are taught in medical schools do not help with this.

Yet this is endemic in medicine. This sort of thing is why in a recent meta-study of 54 landmark cancer trials, only six could be replicated. That is frankly terrifying.

I get bored by more esoteric statistics terms in epidemiology, but accuracy has a simple enough mathematical formula: https://www.lexjansen.com/nesug/nesug10/hl/hl07.pdf

(True positives + True Negatives) / number of all tested

Similar concept comes up in measuring accuracy of computerized image segmentation, where you ignore the true negatives

true positive / (true positive + false positive + false negative)

where it is called intersection over Union (IOU).

I can’t ever remember the names, and just rebuild whatever metric I care about in terms of true vs false and positive vs negative.

Applying all this to the real world is tough because of the over fitting problem. Even if you got the test to be 100% accurate in your tested population, it doesn’t mean it won’t be wrong on the next person it tests. Generalization is hard. So doctors have to guess based on their understanding of the tested and untested population and the sensitivity and specificity of the test. You can go meta and give the doctor a sensitivity and specificity also.

>If an illness is rare, a positive result in a test with a 1% error rate might have an overwhelming probability of being a false positive.

Can you elaborate on this a little more...?

If a given test has a 1% chance of returning true, even when the actual result is false, then from a sample of say 1000 tests we would expect at least 10 trues, in addition to any actual true results. If the chance of having the disease in the general population is low (say 1 in a thousand for this example) then we would expect 11 true results in our thousand samples. Of which 91% are incorrect results - false positives.
So then you'd want to know if the cause of a false positive is random, or specific to the individual, right? If it's random, then how much would a retest change your certainty that an initial positive was a true positive?

ie, if someone was a false positive the first time would they still have a 1% chance of getting another false positive, or is it possible there's something about that individual that will always give them a positive result?

Would I be correct with the following:

If the false positive rate is higher than the expected rate of disease in a given community, then the majority of positive tests will be false positives.

Does this relate to COVID in any way? Since the rates among affected communities seem to be growing rapidly. Would appreciate your thoughts.

Looking at growth rate with false positives is a bit of a mindbender: if you limit your testing to the potential contacts of a positive (false or not), you could get a "false R0" virtual epidemic from testing alone, if and only if you test more contacts per positive than 1/false positive rate. Unfortunately, actual hospitalizations and and deaths rule out a virtual epidemic so this is not a hope to cling to.
> Unfortunately, actual hospitalizations and and deaths rule out a virtual epidemic so this is not a hope to cling to.

Not necessarily. In theory all the deaths could have some other cause, i.e. some fraction of people with a different underlying fatal condition had false positive tests for this coronavirus and then died of the other condition.

That's probably not what's happening, but it's theoretically possible. (It's also probable that some of the reported deaths are that, but who knows what percentage.)

If the false positive rate is p and the false negative rate is q, and the infection rate is r, then you will have p·(1-r) false positives (as proportion of the tested population) and (1-q)·r true positives. Your hypothesis p>r is not enough to settle which of those two numbers is bigger.

(Edited to fix a silly mistake: The phone rang while I was posting, so I ended up being hasty.)

Edit the 2nd: Even in the simplified case q=0, you can't easily tell.

Does 1% error rate mean it's positive 1% of the time or wrong 1% of the time?
How does it relate to universal Healthcare? I'm not an American so maybe I'm missing something.
A "fallacy that seems universal with healthcare folks" is not expected to be related to universal healthcare.
Yes, that was rather embarrasing. Sorry about that.
This is a classic result in Bayesian theory.

Here is someone else explaining it.

https://betterexplained.com/articles/an-intuitive-and-short-...

That is a really excellent explanation. Thanks for sharing it.
In the extreme case, imagine that nobody actually has the condition. Then every positive result is a false positive. The thing we call the "false positive" rate might still be only 1% though; that just means the test is correctly identifying 99% of people as negative and 1%, falsely, as positive.
If you test 1000 people with a test which has a 1% false positive rate (lets ignore false negatives), but only one person in the 1000 is really infected you get about 11 positive tests results of which 10 are wrong.
too slow...
As always, there's an XKCD for this: https://xkcd.com/1132/

Basically, imagine that it's a decade from now, and no one in the world has COVID-19 anymore. The test still has a 1% false-positive rate, though - so if you test a few thousand people, a few of them will test positive. Given that setup, every single one of the test positives will be false positives.

The same holds true if there's one infected person and you test 1,000,000.

I don’t follow but am interested in your point. Please elaborate.
Suppose a test has a 1/100 false positive rate, and the true incidence of the disease over the population that you're testing is 1/1000 (that is, you expect a true positive every 1/1000 times).

In this case, every positive test you observe is around 10x more likely to be a false positive than a true positive.

Got it. Thanks! I have a masters in public health and still need to get schooled by hacker news! Haha