Entropy and min-entropy are properties of distributions, not of individual samples from those distributions. So there's no meaning to "the entropy of each chosen password".
Despite that slight misuse of terminology, the point stands: the article talks about estimating the entropy of a distribution used for generating a password, but the important thing is the “distribution” an attacker is using for guessing the password.
A single password should instead be treated as a sample from a (plausible) attacker’s distribution, and the complexity of that password can be used to estimate the size of the sample space required for that plausible attacker (as in, how many guesses/how much work they’ll have to do). This is, AIUI, the approached used by libraries like https://zxcvbn-ts.github.io/zxcvbn/
The entropy of a distribution for generating passwords matters when generating them in bulk, such as OTPs or implementing a password manager. This doesn’t seem to be the situation being discussed in the article, which is more about rating a user-provided password.
Zxcvbn is also a good idea, but it's a complementary approach. The user or password manager should generate secure passwords (using a high-min-entropy distribution), and the website or application should check that they're secure (using zxcvbn or similar).
Of these two approaches, a high-entropy generation method gives more confidence. It gives a mathematical strength "guarantee": if you design and follow the method correctly, then an attacker, whether or not they know the generation method, is mathematically unlikely to guess your password quickly no matter what order they guess in. "Guarantee" is in quotes because of course the attacker could get very lucky or the user could get unlucky (eg generate a uniformly random 8-character string and it happens to be "password"), and also if there's eg an implementation flaw then your guarantee isn't worth the pixels it's printed on.
By contrast, zxcvbn has no guarantee, because it doesn't use a huge curated dictionary and generation mechanism that the attacker is likely to use. So in addition to missing well-known passwords like "correct horse battery staple", it will miss bad passwords related to current events.
A single password represents a distribution of possible bit values for each byte within it. The password itself is a distribution of characters used within the password.
In fact, the author's article makes this very point, which is why I pointed out the logical flaw in the thinking.
I'll reduce N to 6 for simplifying the author's absurd example but it can expand to any N.
If we take the argument to hold that you roll a random die of N length (6 in our case) and the upperbound represents one strong password, while all other values equate to the word "password", the flaw is in how this logic is applied.
It makes little difference if you look at this as either the bytes involved in the entire set, or the average of all passwords within the set, it's going to come out looking like you are secure.
This means what they're attacking is all permutations of the following set of characters:
a, C, d, h, j, k, l, m, o, p, r, s, w, G, 0, 2, 5, 9, ;, ], @, ^, (
What an attacker must know though, is the character set used within, as well as the length.
This is the logical flaw the author made in their analysis. For an attacker, the entropy of an individual string is taken as possible character permutations required to discover the true password and NOT permutations of the entire strings themselves.
If you look at the values for each string presented in our set, what an attacker has to attack is:
a, d, o, p, r, s, w
C, h, j, k, l, m, G, 0, 2, 5, 9, ;, ], @, ^, (
But in order to attack these, they need to try the full set:
I don't understand your argument at all. Why does an attacker need to try a full set of characters? Real attackers try from dictionaries or password generation methods (eg dictionary + numbers, dictionary + dictionary + number + symbol, etc), and "password" is one of the first passwords they'll try. They do this because they don't know exactly how you generated the password, but due to password leaks, they do have a pretty good idea of how most people generate passwords.
In principle, you could estimate a password's strength by the order in which a cracker would be expected to guess it. But that's a pain, depends on the password cracker being used, and can change at any time. Also, it's not "entropy", which is a well-defined mathematical concept and is what the linked article is about.
Entropy is supposed to be a bound that even if the attacker knows your generation method, they won't be able to do better than brute-force search. For this, the author is correct that min-entropy or a similarly conservative measure is the right one; though for the most common (uniform) generation methods this is the same as Shannon entropy.
Entropy of the set of characters used in your password (well, sets don't have entropy, but let's say of the uniform distribution on that set) isn't the same as entropy of password generation mechanism, because the attacker might have more information. For example, if he knows (or correctly guesses) that your password is a dictionary word, then this is super helpful information that isn't captured in the entropy of the bytes.
> I don't understand your argument at all. Why does an attacker need to try a full set of characters? Real attackers try from dictionaries or password generation methods (eg dictionary + numbers, dictionary + dictionary + number + symbol, etc), and "password" is one of the first passwords they'll try. They do this because they don't know exactly how you generated the password, but due to password leaks, they do have a pretty good idea of how most people generate passwords.
I'm well aware. How does this help the attacker attacking the higher-entropy string I outlined?
How difficult is it for an attacker to attack a password consisting of four lower case english dictionary words?
If you run some of these permutations through John, you'll see how long it takes just to generate even quick broken hashes like MD5 versus using something that is a long string of essentially type-able byte data.
> Entropy is supposed to be a bound that even if the attacker knows your generation method, they won't be able to do better than brute-force search. For this, the author is correct that min-entropy or a similarly conservative measure is the right one; though for the most common (uniform) generation methods this is the same as Shannon entropy.
I'm not sure who has dictated that this is supposed to be how entropy is used for password management. Do you have any references here? Because otherwise it looks like it's still the author and yourself assigning a set of rules to something that doesn't actually apply in the real world and doesn't represent how things are used in practice.
My entire point is that the author has taken an incredibly narrow definition of what entropy must be applied to (only to the distribution of the overall set of characters used in the example) and how it must be used in this circumstance, and argued against that.
Where it falls down is this: The entire purpose of using entropy as a measure of difficulty of cracking a password is precisely the character set approach. If you were to type "password" into any system employing a Shannon entropy analysis on the set of characters required to generate that password, you would at worst have to generate 26^8 combinations. Dictionary attacks are good because they reduce that from around 208 billion to about half a million. 208 billion is not a high enough number, and these systems will tell you it's weak. Smarter ones will probably alert you that it's a dictionary word as well.
If the issue is that people are "misusing" the term entropy for passwords here, that's fine but that's a different article (and I'd still disagree).
> I'm well aware. How does this help the attacker attacking the higher-entropy string I outlined?
Well, suppose the attacker is aware of your password generation method (e.g. it's in an open-source password generator, or you wrote down your method and someone stole the description). You have specified the generator as { 5/6 "password", 1/6 "hj5^@l2jl9GGk;Clkm(0]" }. In this case, the attacker will guess the password pretty quickly -- on the second guess at worst -- even in the 1/6th case that it is "hj5^@l2jl9GGk;Clkm(0]".
This is because the string "hj5^@l2jl9GGk;Clkm(0]" doesn't intrinsically have entropy. The generation method is what has entropy -- but in this example, not very much entropy, which is why you got hacked.
> How difficult is it for an attacker to attack a password consisting of four lower case english dictionary words?
It depends on the dictionary and the cost to guess a password. If you choose from, say, the 3000 most common dictionary words, then it will take the attacker 3000^4 = 81 trillion guesses to guess 4 of them. If the application has appropriately used salt and strengthening, such that it takes eg 10 core-ms to check a guess (with a function like argon2 that's annoying to run on a GPU), and the attacker throws 1000 cores at the problem, then this will take about 81e12 * 10e-3 / 1000 / 86400 / 365 = 25 years to exhaust the entire space, or half that on average.
Of course, the attacker could use more than 1000 cores, so this difficulty is surmountable, but it is pretty expensive to break. If your account is high-value, then 5 or 6 words would be a better choice. Also, if the service doesn't strengthen the password, and the attacker can acquire the hash, then 4 words is definitely not enough.
> I'm not sure who has dictated that this is supposed to be how entropy is used for password management.
I'm not sure what you mean by "supposed to be used" or "dictated". You don't have to use entropy to analyze password management, but it does make for a good analysis. The theory has been around for decades. See eg https://diceware.dmuth.org.
Theorem: if you sample a fresh secret (e.g. a password) from a distribution D of min-entropy x bits, and if an attacker then tries to guess it based on no other information (i.e. they might know D but they didn't like, already phish the secret), then in N guesses they will succeed with probability at most N/2^x.
Proof: By definition, the probability that any one guess is correct is at most 1/2^x, so the overall probability is at most N/2^x by the union bound. Easy peasy.
Note that this theorem does not hold if min-entropy is replaced by Shannon entropy, which is usually what people mean when they say "entropy" without qualifications. Note also that it makes no assumptions about character sets. The character set would only be relevant if each character were chosen iid, or if the attacker decides to attack the password as if this were so.
A single password should instead be treated as a sample from a (plausible) attacker’s distribution, and the complexity of that password can be used to estimate the size of the sample space required for that plausible attacker (as in, how many guesses/how much work they’ll have to do). This is, AIUI, the approached used by libraries like https://zxcvbn-ts.github.io/zxcvbn/
The entropy of a distribution for generating passwords matters when generating them in bulk, such as OTPs or implementing a password manager. This doesn’t seem to be the situation being discussed in the article, which is more about rating a user-provided password.