Hacker News new | ask | show | jobs
by tomjakubowski 3387 days ago
> A reasonable test for passwords is to run them through an IDNA checker, which checks whether a string is acceptable as a domain name component. This catches most weird stuff, such as mixed left-to-right and right-to-left symbols, zero-width markers, homoglyphs, and emoji.

Why test this at all? It's not as if a website should ever need to render a user's password as text. Is there another use case for excluding this "weird stuff" that I'm not seeing?

2 comments

Suppose I include 'ü': LATIN SMALL LETTER U WITH DIAERESIS in my password. I switch to a different browser/OS/language and now when I enter "ü" I get 'u': LATIN SMALL LETTER U + ' ̈': COMBINING DIAERESIS. I can't log in anymore, though what I do is identical and defined to be equivalent. Especially if the password is hashed before comparing it, you can't treat it as just a sequence of bytes.

You don't need to use IDNA for this, though. There are standards specifically for dealing with Unicode passwords, such as SASLprep (RFC 4013) and PRECIS (RFC 7564).

I would not actually disallow these characters, but you may warn the user about the existance of problematic characters in their password of choice.

If I want to use äöüßÄÖÜẞ because I'm confident that I can properly type them on all devices I'll need to type then, then let me. It's not your concern what method of input I'm using.

And maybe, just maybe, using latin characters is actually more of a hassle for a user anyway. (I think the risk of that occoring is low, but still. At the moment, it's a self-fulfilling prophecy that all users have proper method to input atin script available. We simply force them to have one.)

Edit: And the confusion is also possible with just latin characters. U+0430 looks exactly like "a", but has a different code point and thus ruins the hash.

I agree, if it can live in a byte buffer and not crash the transport mechanism or hash function then it's good enough for me