Hacker News new | ask | show | jobs
by orblivion 3384 days ago
Per commit, or over the expected lifetime of Git's utilization of sha-1? If it's the former it's not very useful unless we have an idea how many commits are made.
3 comments

You should expect to see a hash collision with probability >½ with about sqrt(2^n) hashed objects for a hash of length n. I believe hashes are made per changed file per commit.

A large project (like Firefox) might make a few hundred commits per day, or tens of thousand per year. So that comes out to 2^20-2^30 hashes per year. I don't know how many distinct repositories GitHub has, but I doubt it's larger than 2^30. So that means that GitHub has no more than 2^60 SHA-1 hashes related to git, and probably more like 2^40-2^45.

So the probability of a collision is <<2^-30 (the collision function is logistic, so assuming linearity between 1 and 2^90 is a wildly overoptimistic assumption) in the optimistic case, probably something like <<2^50. In perspective, it's more likely that you will win the lottery tomorrow but fail to discover so because you were killed by lightning than it is that GitHub will detect a chance SHA-1 collision.

Wait, are we talking about chance SHA-1 collisions, or a false positive for this test for this vulnerability?
It's not logistic, that's a specific kind of function.
There are ~3.1e9 seconds in a century, and ~7.5e9 people on Earth. So if every human makes a git commit every second for a century, the odds of a single false positive is ~5.3e-7 (i.e. less than one in a million).
if every human makes a git commit every second for a century

Will someone please write a dystopian novel around this premise? It almost writes itself. GitHub turns evil, forcing the world population into subservience from birth; using Git is all anyone is ever allowed to do, and is how people conceptualize the universe and access entertainment; various cults form with the belief that randomness can be influenced via the right ceremony...

You can source inspiration from https://twitter.com/DystopianYA

I was recently thinking of another idea where humanity works for Google and everyone's job is to solve recaptchas. Everything is free for people who earn enough credits by solving the puzzles.
Isn't this an episode of black mirror?
It is.
Oh funny, which episode? I've never seen the show before.

Someone here had mentioned that they try to incorrectly answer recaptchas as a way to spite them for trying to steal free labor from them. My mind took that to the extreme of everyone working for Google... :)

Sounds like the plot to For-Profit Online University[1]. "I'm a digital gardener."

[1] https://www.youtube.com/watch?v=XQLdhVpLBVE

If it is not a movie yet.. you should definitely write one :D
> various cults form

...each believing in a Supreme branching strategy

And your status in society is determined by the number of consecutive digits of the hash of your very first git commit.
Alternative premise, tech interviews continue to become increasingly competitive, and if you want to be a true dev hirable by Google, etc. this is the output you need.
I could even see a matrix like aspect where those who are unplugged do so by creating hash collisions to hide what they are really doing.
You're right, I suck at probabilities and did it wrong: (3.1e9*7.5e9)/2^90 .
Well, I'd lay a fair amount of money that the number of commits is less than 10^20...