Hacker News new | ask | show | jobs
by x1798DE 4394 days ago
Well, he's saying that he has a sample of 40k hackers' passwords stored up somewhere, and between them there are 2000 unique strings, ~1200 of which were in plain text and didn't need to be cracked at all. So if this sample of 40k hacker passwords is a random sampling, then essentially he has a random unbiased sample of 1200 unique passwords, plus a biased set of 300 more.

He's not super clear about where the 40k passwords came from, so they may be a random sample, but it's quite possible that it's just a sampling of bad hackers - he mentions that he has gathered many examples of bots and shells and such, so you can imagine that he's looking at a sampling of 1. hackers whose bots store their passwords in such a way that he can reverse-engineer where they are stored and 2. hackers who store their passwords in plain-text.

That said, if he has 40,000 passwords that boil down to 2000 unique strings, of which only ~400-500 are either good passwords stored in plaintext or not easily crackable, then that means about 35,000 out of the 40,000 passwords he captured were easily guessable (I'm assuming here that there were no duplicates in the "good" password set), which is about 87.5% of his sample.

1 comments

>it's quite possible that it's just a sampling of bad hackers - he mentions that he has gathered many examples of bots and shells and such, so you can imagine that he's looking at a sampling of 1. hackers whose bots store their passwords in such a way that he can reverse-engineer where they are stored and 2. hackers who store their passwords in plain-text.

Yes, that's basically my point. The set of hackers who use strong passwords and the set of hackers who don't well-protect those passwords in their bots/viruses/whatever probably doesn't have a lot of overlap.

Also, it sounds like he couldn't crack (and thus couldn't include in the sample) some of the hashed passwords. Passwords that he can't crack or brute-force reasonably are probably strong passwords. Not having those passwords biases the sample - it's like doing a standardized test when all the honors classes are on a field trip, by removing the top-end you downward-bias the sample and make the overall sample look worse.

I agree that the 40k sample is probably biased, but if you assume it's not actually biased, your second point doesn't hold, because the ones he couldn't crack are presumptively strong, so adding in the ones that he knows are strong because he found them in some plaintext form, that leaves about 500 passwords out of 40k that he couldn't find. If anything, the uncracked passwords bias you towards thinking their passwords are stronger, since it's possible that some of them are just weak passwords stored in some non-standard way, or there's a salt included in the program that he missed or something.