Hacker News new | ask | show | jobs
by Jfreegman 2269 days ago
That's a great question, and I wish I was good enough at math to give you a sophisticated answer. But my thinking is that the entropy you might gain by allowing duplicates is negated by the huge set of weak/guessable passwords you allow. For example, the password "aaaaA1!" is probably more likely to be guessed or used by others than "agkxA1!". (I just checked on haveibeenpwned.com, and the former has been seen 12 times, while the latter has been seen 0 times. Not very scientific I know)

Though this isn't set in stone if someone wants to formally correct me.

And libsodium is indeed a pleasure to work with.

2 comments

No, your intuition is bad.

If you use passwords of a small enough size that this would really be a problem (like four digit PINs or your "aaaaA1!" example) then your password isn't delivering adequate security against brute force and so you've definitely lost.

If you use passwords that are big enough to make brute force impractical anyway then this "feature" will never make any real difference and is just a waste of time at best, and since it adds complexity it's another place to hide bugs.

Having a PwnedPasswords check (not this silly "repeating characters" test) makes sense if you allow users to enter passwords. Whereas if you generate passwords of decent length and at random then they're random so there's no purpose in checking them.

>If you use passwords of a small enough size that this would really be a problem (like four digit PINs or your "aaaaA1!" example) then your password isn't delivering adequate security against brute force and so you've definitely lost.

I can't force users to use reasonably long passwords. In the case that they don't (and some certainly won't), it's preferable that their password is still as good as it can be given the length.

When it comes to web logins, brute force is probably the least effective method of attack. Social engineering, dictionary attacks, rainbow tables, password lists, guessing and so on are all much bigger concerns. It's my thinking that most of those threats are better addressed by sacrificing a negligible amount of entropy and removing a large subset of weak passwords.

Moreover, this reduction in entropy only applies when the attacker knows the algorithm used to derive the password, making it even less relevant for the average case. (That's not an appeal to security through obscurity; just an observation).

>Whereas if you generate passwords of decent length and at random then they're random so there's no purpose in checking them.

Random != secure. "aaaaaaaaaa" could be the result of a random function. The goal is to create passwords that are both difficult to brute force, and difficult to guess, while making no assumptions about the length.

Entropy has diminishing returns; if it takes 10 million years to crack a password, another 5 million doesn't increase security. However I would re-consider if someone could provide a concrete example of how the small loss in entropy could lead to a practical vulnerability.

> I can't force users to use reasonably long passwords.

But you can, apparently, force them to use passwords that meet whatever other weird criteria you choose. So this is a statement of policy. You've decided by policy to allow "agkxA1"† but not "V+0mCx&3LmgyC" (because the latter has a "duplicate letter C") and that's crazy.

> Social engineering, dictionary attacks, rainbow tables, password lists, guessing and so on are all much bigger concerns.

Since you enumerated them let's run down the list and actually examine what they are and how your "no duplicate characters" rule addresses them or doesn't.

Social engineering: bad guys persuade the user to give them the secret. Only brick wall UX works well here, so that's WebAuthn / U2F and similar. Your rule makes no difference.

Dictionary attacks: bad guys have a list of passwords to try. Some of those passwords have "duplicate characters" and so are eliminated by your rule. But others do not.

Rainbow tables: it's weird to call out Rainbow tables specifically and suggests you're cargo culting. The Rainbow table is just a very specific optimisation of a time-space tradeoff attack on password hashing (the main thing salt mitigates). Your rule actually makes things marginally easier for attackers in this scenario, again if passwords are short you lose, if they're long enough you don't gain anything.

Password lists: You already listed a dictionary attack. A password list is just a dictionary.

Guessing: Guessing is a brute force attack, the same thing you said was "least effective".

Needlessly making your software more complicated is the way it can lead to practical vulnerability, nobody is going to be able to explain this to your satisfaction and so all I can do is recommend that people avoid it.

Your "add characters from different classes into a pool, then shuffle it to make the password" approach is just slightly worse than actual random long passwords, but the added complexity to make it slightly worse is the problem.

† Actually this isn't allowed, the algorithm actually used insists upon putting at least one character of each class into the password before any others are added. But this far too short password does obey the "No duplicate characters" requirement.

>But you can, apparently, force them to use passwords that meet whatever other weird criteria you choose.

No one is forced to use the random password generator, and even if I set a minimum limit they could just chop it up to their liking. All I can do is guide users towards a criteria I think provides the most effective security.

>You've decided by policy to allow "agkxA1"† but not "V+0mCx&3LmgyC" (because the latter has a "duplicate letter C") and that's crazy.

I didn't suggest that all passwords that contain duplicate characters are weak. But if we were to simply use a random string of characters without discrimination, then we would have to allow passwords like "aaaaaaa" and "abcdefg123", which is unacceptable in my opinion. If you agree that there should be some sort of policy that guarantees certain properties, then I'm confused by your position, as that would contradict the main point of your criticism centered around code complexity.

The list of attack vectors was not to suggest that removing duplicate characters was a solution to all of them (I disagree with your assessment but we'll leave that alone for now). I was merely highlighting the fact that brute force attacks are one of the least important factors in securing web-based accounts.

If you are able to point out a concrete example of a vulnerability introduced by disallowing duplicate characters (that includes bugs in the code caused by the added complexity) I'm all ears/eyes. If not, I'm going to call an end to this debate for now. I do appreciate your input though and it's definitely given me something to think about.

Rainbow tables are a brute force technique.
Increasing entropy mitigates brute force attacks but not necessarily rainbow table attacks, hence the distinction. If every user had a unique password, rainbow tables would be rendered useless. This is why it's important to reduce the likelihood that a randomly generated password is comprised of a common pattern.
> Increasing entropy mitigates brute force attacks but not necessarily rainbow table attacks

No. All time-space trade-offs need to expend the entire attack effort once (and usually it's considerably more). Their advantages are that you can do this in advance (timeliness) and that you can re-use the product (which is what salt mitigates).

You can very simply increase entropy until the attack effort cannot be deployed at all, regardless of when. For example the 'pass' tool lots of people have mentioned in this thread defaults to 24 character random passwords, far more than 128 bits of entropy. As a result it simply isn't possible to deploy the attack effort even once, you can neither do a Brute Force attack nor build the Rainbow Table to find these passwords.

The way the Rainbow Table got famous is its application to the LM Hash, an old (but already terrible when it shipped) Microsoft password hash. LM Hash uses two 56-bit values to represent a password, naively this looks like 112-bits of entropy but you can attack each independently so it isn't.

Rainbow Tables took attacking this from something that could in theory work on any password but you'd get bored waiting for your "crack" program to finish if it wasn't trivial to a few minutes on a fast laptop for every possible password. Because it's a time-space tradeoff, so somebody put all the effort in once and then you can re-use it.

>No

Yes. For example, if a rainbow table contains a match for "a" repeating 500 times, then that password's entropy is a non-factor. Therefore entropy in of itself does not necessarily mitigate rainbow table attacks. Entropy does necessarily mitigate naive brute force attacks. We can use this same logic for more reasonably sized passwords that are comprised of known patterns but would otherwise be difficult to brute force.

Obviously you could use extremely long passwords and be fairly certain that you won't end up with something that would be found in a rainbow table or dictionary. You could also just use passwords that are long enough that they can't be brute forced and remove common patterns from the set of possible passwords. Both of these solutions are valid, and both have theoretical weaknesses that are irrelevant in practice.

You've spent a significant amount of time trying to convince me that my random password generator is flawed in some way. Why don't you just demonstrate how instead of continuously trying to argue tangentially related theoretical points? I would hope that such an effort would be motivated by a practical concern and not merely the desire to bikeshed out of boredom.

Small correction: the password that has been seen 12 times is "aaaaA1" (no ! char). But "agkxA1" has still been seen 0 times.