|
A single password represents a distribution of possible bit values for each byte within it. The password itself is a distribution of characters used within the password. In fact, the author's article makes this very point, which is why I pointed out the logical flaw in the thinking. I'll reduce N to 6 for simplifying the author's absurd example but it can expand to any N. If we take the argument to hold that you roll a random die of N length (6 in our case) and the upperbound represents one strong password, while all other values equate to the word "password", the flaw is in how this logic is applied. Imagine this is our set of possible values: password, password, password, password, password, hj5^@l2jl9GGk;Clkm(0] It makes little difference if you look at this as either the bytes involved in the entire set, or the average of all passwords within the set, it's going to come out looking like you are secure. This means what they're attacking is all permutations of the following set of characters: a, C, d, h, j, k, l, m, o, p, r, s, w, G, 0, 2, 5, 9, ;, ], @, ^, ( What an attacker must know though, is the character set used within, as well as the length.
This is the logical flaw the author made in their analysis. For an attacker, the entropy of an individual string is taken as possible character permutations required to discover the true password and NOT permutations of the entire strings themselves. If you look at the values for each string presented in our set, what an attacker has to attack is: a, d, o, p, r, s, w C, h, j, k, l, m, G, 0, 2, 5, 9, ;, ], @, ^, ( But in order to attack these, they need to try the full set: a-z a-zA-Z0-9;:[]!@#$%^&*(){} One of these will be VASTLY easier to break. |
In principle, you could estimate a password's strength by the order in which a cracker would be expected to guess it. But that's a pain, depends on the password cracker being used, and can change at any time. Also, it's not "entropy", which is a well-defined mathematical concept and is what the linked article is about.
Entropy is supposed to be a bound that even if the attacker knows your generation method, they won't be able to do better than brute-force search. For this, the author is correct that min-entropy or a similarly conservative measure is the right one; though for the most common (uniform) generation methods this is the same as Shannon entropy.
Entropy of the set of characters used in your password (well, sets don't have entropy, but let's say of the uniform distribution on that set) isn't the same as entropy of password generation mechanism, because the attacker might have more information. For example, if he knows (or correctly guesses) that your password is a dictionary word, then this is super helpful information that isn't captured in the entropy of the bytes.