Hacker News new | ask | show | jobs
by tracker1 1232 days ago
While the point on overlong encodings is true... there are multiple ways to do composition[1]. brown-skin + high-five or high-five + brown skin... etc.

Part of why it's a good idea to normalize input before password hashing, as an example... It will likely become more common over time to use emoji as passphrase input.

1. https://en.wikipedia.org/wiki/Unicode_equivalence

1 comments

Yes, composing characters are what I was referencing with "higher level Unicode shenanigans". This doesn't stop there though - many people would say that "а" and "a" are encodings of the same character even if Unicode thinks otherwise. All that is above the concerns of UTF-8 though, which only cares about encoding code points into byte sequences.