| That's probably the wrong way to think about it, and might -- as it does in this case -- lead to a ridiculously oversized password-guessing implementation which tries to do too much fancy business. The most obvious way to do password strength checking would not (I don't think) let you "use this list to begin cracking", but would instead estimate the Kolmogorov complexity of the password, as a proxy for its entropy. That sounds daunting, but it's actually pretty simple in principle: append the password to a couple concatenated dictionaries plus popular password files, and see how much it compresses with your favorite zipping algorithm. Compare it to how much 'password1' zips, because you know that's the first one that they try and therefore it has complexity 2^0. If the zipping algorithm is good, it will automatically figure out most of these tricks directly from the 'bad password lists'. I would say a little more: it is probably the case that you can "steal" the dictionary from one zipping and force a zipping algorithm to use that dictionary. If this is the case, the dictionary needed reduces to 64 KiB (I believe) rather than the 600 KB that the above script requires. I don't know how much effort it takes to get zlib-with-a-static-preset running in JS but then again, I don't know how much time it took zxcvbn to reach its final form either. Using /usr/share/dict/american-english for my dictionary is a bit crap because it does not yet speak l33t, but my dictionary can be used for "correcthorsebatterystaple". XKCD estimates 44 bits = 5.5 bytes; gzip --best estimates 7 bytes, maybe more if we had more sequences ending in '1' to better compress 'password1'. (Some extra bits are to be expected purely due to the diverse number of password-guessing algorithms; 'switch one character to l33t, switch two characters to l33t, end with a number' offer a couple extra bits which XKCD ignores in order to establish a lower bound.) |
gzip does a terrible job at this. Long passwords that are in the dictionary come out as much stronger than short passwords not in the dictionary. For example, correcthorsebatterystaple increased the size by much more than ablekindpagequite despite the former being in the dictionary verbatim and the latter not.
lzma does a much better job, but all implementations I could find like to pad things out to 4-byte boundaries giving you poor resolution.