Hacker News new | ask | show | jobs
by metaphor 2542 days ago
This line caught my attention:

> The FBI said its system is 86 percent accurate at finding the right person if a search is able to generate a list of 50 possible matches, according to the GAO. But the FBI has not tested its system’s accuracy under conditions that are closer to normal, such as when a facial search returns only a few possible matches.

What the GAO study[1] actually said:

> However, we found that the tests were limited because they did not include all possible candidate list sizes and did not specify how often incorrect matches were returned. ... The FBI’s detection rate requirement for face recognition searches at the time stated that when the person exists in the database, NGI-IPS shall return a match of this person at least 85 percent of the time. However, we found that the FBI only tested this requirement with a candidate list of 50 potential matches. In these tests, 86 percent of the time, a match to a person in the database was correctly returned. The FBI had not assessed accuracy when users requested a list of 2 to 49 matches.

According to FBI, a smaller list would likely lower the accuracy of the searches as the smaller list may not contain the likely match that would be present in the larger list.

In other words, their acceptance test procedure was gamed from the beginning.

[1] https://www.gao.gov/assets/700/699489.pdf

4 comments

Ugh, this is the worst part about this mess IMHO. U.S. law enforcement has time and time again shown that they are more than willing to argue in bad faith about statistics. This is going to turn out no different than fishing expeditions based on partial DNA matches where the prosecution predictably finds some 1/100,000 match after they search a database of 250,000 people and use that as some cornerstone of "obvious guilt" and convince the jury that there's only a 1/100,000 chance that he's innocent.

So much of forensic science is a sham, we claim as a country to uphold a system of justice whereby you're innocent unless proven guilty beyond a reasonable doubt, yet so many prisoners on death row have been exonerated, some posthumously even when there's no real incentive to look for evidence of innocence at that point.

Who wants to bet that prosecutors are going to start using flawed facial recognition results as if they are equivalent to a victim picking out a suspect from a lineup of 10 people? "There was a 99% chance of a match"

I've worked in DNA forensics/DNA databasing/LEO IT for about 10 years. I have never heard of DNA evidence with a statistical likelihood of 1/100,000 being presented in court. It is true that all evidentiary DNA is presented in a statistical manner, but the statistical thresholds are far higher than 1/10x the amount of people on earth. A professional accredited DNA forensics laboratory would never publish or release a report with shoddy statistics like that. If attempted it would ruin careers and shut down a lab, today in 2019. We also work on OLD exoneration cases.

Maybe prior to the early 90's when the technology and chemistries were still kinda crude and not every lab could afford accreditation, but definitely not in the US in the past 10 years.

To be clear I am not arguing about the philosophy of if it's ethical to use DNA databases or facial recognition from driver's license databases. I'm saying comparing the use of DNA evidence to using facial recognition on a driver's license database doesn't make sense.

I totally never thought of stats for forensic science being presented in a misleading way before like in your example but it completely makes sense. The odds of finding _any_ 1/100,000 match are incredibly high. Wow.
Scientific integrity in forensic sciences isn't exactly a strong point for the FBI: https://slate.com/news-and-politics/2015/04/fbis-flawed-fore...
Exactly. They could get 100% matching with 50 results returned out of 50 subjects in the database.
But also, no-one understands percentages. WaPo should really present this information as a probability tree or an icon array chart.
I'm not sure I can follow your thinking when you say that a probability tree is easier to understand than a percentage. Has this been broadly scientifically understood? Is this something you have found within your field of work?
NICE, Cochrane, and the English NHS recommend talking about natural frequencies and not percentages. They also say that if you have to use percentages you should use absolute numbers, not relative numbers.

Take something really simple about percentages. What does 0.1% mean? Only 1 in 4 people know this means "1 in 1000".

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3310025/

> Gigerenzer et al show how only 25% of the general population could correctly identify 1 in 1000 as being the equivalent of 0.1%.

Take something a little bit more complex, such as the relative increase in risk vs the actual total risk. We know that most people do not understand what a 75% increase in risk means in real terms. But most people do understand a simpler explanation: of people who don't eat $THING we'd expect to see 1 person out of 1000 people over ten years developing a disease, but if 1000 people all eat $THING every day over ten years we'd expect to see about 2 people developing the disease.

See also cancer screening: (This is a good useful link that rattles through most of what I'd want to say) https://www.nice.org.uk/guidance/ng34/evidence/expert-paper-...

“If you participate in breast screening, you will reduce your chances of dying from breast cancer in the next 10 years by 24%” versus

“If you participate in breast screening, you will reduce your chances of dying from breast cancer in the next 10 years from 37 in 10,000 to 28 in 10,000”

BMJ has more about relative vs absolute risk: https://bestpractice.bmj.com/info/toolkit/practise-ebm/under...

Here's a final example. For some time we knew members of the public couldn't do this, but we thought healthcare professionals could. Turns out that they couldn't do it either. Both groups find it much easier if you convert this into natural numbers and probability trees.

"A machine has been invented to scan a population for a disease. The machine is good but not perfect. If you have the disease there is a 90% chance it will return positive. If you do not have the disease there is a 1% chance it will return positive. About 1% of the population have the disease. Mr Smith is tested, and the test comes back positive. What's the chance Mr Smith actually has the disease?"

(This is from "Reckoning with Risk" by Gerd Gigerenzer).

Most people cannot get the right answer from this question.

If you reword the question they can.

"Think of a group of 100 people. 1 of them has a disease. The entire group is screened. The one person who has the disease tests positive. Of the 99 people who don't have the disease one person will also test positive. How many people of those who test positive have the disease?"

You can also show this as a probability tree. https://imgur.com/a/JWVQRxI

Two books I recommend are "Reckoning With Risk" and "Risk Savvy", both by Gigerenzer.