Hacker News new | ask | show | jobs
by peepeepoopoo3 1158 days ago
Remember, according to Wikipedia, Benford's law applies to election data in every country except the United States, where the laws of statistics are totally different.
2 comments

Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

No, I don't think I'm Going to Go out and change Everything about my Replies.
For elections, the test should be with the second digit, not the first.

See https://en.m.wikipedia.org/wiki/Benford's_law > Benford's law has also been misapplied to claim election fraud. When applying the law to Joe Biden's election returns for Chicago, Milwaukee, and other localities in the 2020 United States presidential election, the distribution of the first digit did not follow Benford's law. The misapplication was a result of looking at data that was tightly bound in range, which violates the assumption inherent in Benford's law that the range of the data be large. The first digit test was applied to precinct-level data, but because precincts rarely receive more than a few thousand votes or fewer than several dozen, Benford's law cannot be expected to apply. According to Mebane, "It is widely understood that the first digits of precinct vote counts are not useful for trying to diagnose election frauds."

The other examples on this page used the second digit for their election analysis.

Matt Parker did a good video with lots of visuals to explain why Benford's Law works in general, but why it cannot always be simply applied to election results.

https://youtu.be/etx0k1nLn78

Except the other digits failed too.
They are quoting from the Wikipedia article you yourself suggested. You are free to suggest another source.

Wikipedia cites this article:

https://physicsworld.com/a/benfords-law-and-the-2020-us-pres...

> In a working paper published on 10 November, Mebane looks deeper at the US election data using a 2BL test, based on the second digits and Benford’s law digit probabilities, along with other statistical tools.

> The bottom line: there are no signs of irregularity in the officially declared precinct vote counts data from Fulton County, GA, Allegheny County, PA, Milwaukee, WI, and Chicago, IL, as some have claimed.

That article cites this paper:

http://www-personal.umich.edu/~wmebane/inapB.pdf

> The vote counts from the four jurisdictions are not final, so one should treat them cautiously. Nonetheless preliminary analysis shows little that suggests there are problems.

Presumably a more up to date source exists now that the votes are finalized. The paper also has a link to the data they used on GitHub if you'd like to see for yourself (and both this & the paper below say they downloaded it from the Secretary of State websites of each state, so presumably you could do that too if you didn't trust this random GitHub).

This paper from MITRE is interesting but doesn't use 2BL. (They don't find any evidence of fraud. They do discuss 2BL in an appendix.)

https://apps.dtic.mil/sti/trecms/pdf/AD1148123.pdf

I remember from an election fraud class in college that the best digits to check on vote counts (particularly in places like Russia) are actually the trailing digits, which should be uniformly distributed. Apparently eastern European fraudsters at this point are sophisticated enough (and have enough leeway to fudge the vote counts) that they can get past checks based on Benford's law, but they are usually too lazy to whiten the trailing digits of their fake numbers.

I was curious about 2020 after the pop science emerged and checked all of these precincts' trailing digits (as well as a few other statistics), and they looked totally fine.