Hacker News new | ask | show | jobs
by ZachPruckowski 4394 days ago
>it's quite possible that it's just a sampling of bad hackers - he mentions that he has gathered many examples of bots and shells and such, so you can imagine that he's looking at a sampling of 1. hackers whose bots store their passwords in such a way that he can reverse-engineer where they are stored and 2. hackers who store their passwords in plain-text.

Yes, that's basically my point. The set of hackers who use strong passwords and the set of hackers who don't well-protect those passwords in their bots/viruses/whatever probably doesn't have a lot of overlap.

Also, it sounds like he couldn't crack (and thus couldn't include in the sample) some of the hashed passwords. Passwords that he can't crack or brute-force reasonably are probably strong passwords. Not having those passwords biases the sample - it's like doing a standardized test when all the honors classes are on a field trip, by removing the top-end you downward-bias the sample and make the overall sample look worse.

1 comments

I agree that the 40k sample is probably biased, but if you assume it's not actually biased, your second point doesn't hold, because the ones he couldn't crack are presumptively strong, so adding in the ones that he knows are strong because he found them in some plaintext form, that leaves about 500 passwords out of 40k that he couldn't find. If anything, the uncracked passwords bias you towards thinking their passwords are stronger, since it's possible that some of them are just weak passwords stored in some non-standard way, or there's a salt included in the program that he missed or something.