Hacker News new | ask | show | jobs
by lone_haxx0r 2283 days ago
The why include the symbols in the first place? Just in case?
3 comments

It's most likely for round-trip compatibility with another encoding. There are many Unicode codepoints that simply represent combinations of other codepoints. If you don't care about round-tripping, just normalize everything to NFKC or NFKD (the difference being that accented letters like รก are one codepoint in NFKC and two codepoints for the base letter and combining mark in NFKD).
For lossless round-tripping with legacy character sets. This is one explicit design goal of Unicode.
So we can make search that tiny bit more complicated.