Hacker News new | ask | show | jobs
by wtvanhest 2232 days ago
Would you or someone else mind expanding on this thought a little? Why does 99.9% specificity mean that 10% will be false positives?

{added: great answers below} well worth understanding this point. In short specificity measures the % of the population tested which had false positives, but doesn't give you the ratio of false positives to positives or the probability that a positive test means you actually have the anti-bodies.

6 comments

Let me give it a try. Suppose we have 100,000 people in a statistically representative town.

If 1% of people have had COVID-19, then that's 1000 people who have had it, and 99,000 people who haven't.

The test has a sensitivity of 100%, which means all 1000 people who've had it will test positive.

The test has a specificity of 99.9%, which means 98,901 of the 99,000 people who haven't had it will test negative; but that leaves 99 people who haven't had it, but test positive anyway.

That gives us 1099 people who look like they have immunity; but only 91% of those people are actually immune: 9% of the people are false positives.

If instead we have a specificity of 99%, then only 98,010 of the 99,000 people who haven't had it will test negative, leaving 990 people who haven't had it but test positive anyway.

That gives us 1990 people who look like they have immunity; but only 50% of them actually do -- the other 50% are false positives.

So if I'm understanding this correctly, with this test.

If you test negative, you are clear, guaranteed, no false negatives.

If you test positive, there is a 10% chance it's a false positive.

I guess my follow up question, does a retest of the positive population make that false positive rate drop to 0.1%, or is the reason for false positive significant to an individual and not random chance?

> If you test positive, there is a 10% chance it's a false positive.

Well, don't misunderstand -- it's got nothing to do with the test per se, but with the probability that you had the disease in the first place.

The test itself has two probabilities:

1. If you've had COVID-19, the probability that it will report positive (sensitivity)

2. If you haven't had COVID-19, the probability that it will report negative (selectivity)

But those probabilities give you a mapping from reality -> test_result. What you want is the reverse of that -- and find the probability from a test_result -> reality. When you do that, you have to factor in the probability that you have the disease in the first place.

If 50% of the population have had COVID-19, then a positive test means a 99.9% probability of having had the virus. If 1% of the population, a positive test means 91% likely you have it. If only 1 in a million people had COVID-19, then the number of false positives would completely overwhelm the number of true positives.

This is sometimes called the "Base rate fallacy": forgetting to factor in the base rate when determining something like this.

It's important for things like, say, systems which automatically detect terrorists at airports. How many travelers at an airport are actually terrorists planning to attack a plane? It's got to be one in hundreds of millions, if not billions. With that low of a base rate, even if you had a system that was 99.999% accurate, the vast majority of people it flagged up would be innocent.

I had the same question about retesting. Here’s a quote from Scott Gottlieb (former FDA commissioner):

“While all of these tests can still generate false positives—a finding that you have the antibodies when you don’t—that risk can be sharply reduced by repeating the test if it comes back positive. The predictive value of two consecutive positive tests is high enough that you can be confident antibodies are present.”

https://www.wsj.com/articles/antibody-knowledge-can-be-power...

> If you test positive, there is a __% chance it's a false positive.

This percentage is based on both the test and the real infection rate.

Let's say you have 1000 people, 1% or 10 have had the virus and are seropositive (have antibodies).

Of the 900 people who do not have ABs, 99.9% or 899.1 are correctly identified as not having them, 0.9 is identified incorrectly as having them when they actually do not.

Of the 10 who actually have antibodies, 100% are correctly identified.

So 10.9 are identified as having antibodies, in 0.9 person's case incorrectly which is about 10%.

https://en.wikipedia.org/wiki/Positive_and_negative_predicti...

If the number of true positives is 10/1000, and the test gives you 11/1000 positive results, then 1/11 of your tested positive results are false positives. (Actually closer to 9% than 10%).

via Bayes Rule: “Assuming an underlying infection rate P(I), what’s the probability that a person is actually immune (=was infected), given that they test positive, i.e. P(I|+)?”:

  P(I|+) 
  = 
  P(+|I)*P(I) / P(+)
  = 
  Sens*P(I) / [Sens*P(I) + (1-Spec)*(1-P(I))]
  =
  .01 / (.01 + .001*.99)
This is exhibit A of the base rate fallacy (https://en.m.wikipedia.org/wiki/Base_rate_fallacy).
Bayes theorem assuming 1% actual positive rate. If you test 1000 people, you will get roughly 10 positives and 1 false positive for a false positive rate of 10%.
I think it's easier to understand when you take it to the extremes: Assume nobody has the thing you're testing for. You test 100k people at 99.9% specificity, which means you get 1k positives just because of the rate. Since nobody hass the thing you're testing for, they're all false.

When the thing you're testing for is very rare, it's just as rare that the people who tested positive will actually have it.

But we know there are more than 1% cases.

We have good data that the IFR is in the 0.1-1% range, putting cases in say MA in the 7% range a couple of weeks ago (time from infection to death), which based on confirmed cases would put it well above 10% now

That means you’d have 1k false positive and 10k true positive from a test.