Hacker News new | ask | show | jobs
by jonchang 3118 days ago
It is justified, both in the citation and the actual text. The ARD dataset is driven by self-reporting from various local agencies. The fact that the FBI's SHR data and the ARD dataset have a mismatch (that is, there are police-related homicides in SHR that are not present in the ARD data and vice versa) is proof enough that there is underreporting in these datasets!
1 comments

I just realized that underreporting is already accounted for in the 1,250/y thanks to the statistical analysis described in the article.

A = the number of jurisdiction-reported homicides

B = the number of media-reported homicides

M = the number of homicides on both lists

N = AB / M

Now, if jurisdiction-reported homicides are unreported by a factor of X, we can derive a more accurate figure for A by multiplying A by X. We also multiply M by X, because adding cases to list A also adds a similar ratio (on average) to the matches between both lists. And the estimate doesn't change.

N = (XA * B) / XM = AB / M

This assumes, of course, that homicides in the jurisdictions that don't report to the FBI or BJS are still reported by the media. That may not be true but if it's not true, it must be proven false.