|
|
|
|
|
by ubernostrum
2737 days ago
|
|
Inventing your own pseudo-normalization of Unicode is a worse idea than using the actual normalization forms Unicode defines. Also, if you think you can decompose without allocating memory... well, try a code point like U+FDFA. For reference, its decomposition is: U+0635 U+0644 U+0649 U+0020 U+0627 U+0644 U+0644 U+0647 U+0020 U+0639 U+0644 U+064A U+0647 U+0020 U+0648 U+0633 U+0644 U+0645 (and that doesn't begin to touch any of the potential issues with variant forms, homoglyph attacks, etc.) |
|
This is actually implemented in ZFS. (And also character-at-a-time normalization for hashing.)
I don't see how homoglyphs enter the picture. Can you explain?