| From the outset Unicode's goal (more so than ISO 10646 though now they're one and the same) was to unify all existing character sets, so you'd only need one. Necessarily then, there should not be other sets that encode things you can't in Unicode, since then you can't displace those with Unicode. So, particularly in the early life of Unicode the goal was collect stuff that already exists and add it to Unicode. (These days we're finished with that and most new work is on adding things that weren't previously in any character set) Two controversial things were done, at opposite ends of the spectrum, during this period of consolidation: What you're seeing here is adding copies of the entire Latin alphabet, but with some particular property that Latin users would not really consider part of the character, such as "bold" or "italic" but which _was_ preserved in some character set being used somewhere. Without this choice, if we converted a text file encoded in a way that distinguished bold and italic characters, we'd lose that bold/ italic and it might be significant. This would be like when you get a black & white photocopy of a sheet that says "Ignore any text below shown in red" Um, but none of this text is red? Oh. Probably some of it was before it was photocopied. Oops. At the far end of the spectrum, a process called CJK unification took place in which scholars of the languages using characters from the Han ("Chinese") writing system decided that although say, a Japanese character set and a Chinese character set both had a particular character, and the Chinese and Japanese would not draw this character the same way, actually in some linguistic sense it's the same character (and in many cases the visual differences are quite small) and so Unicode should not encode both separately. There's a coherent technical argument for why both these types of decisions made sense, but they were nonetheless controversial. You should not use weird characters like italic Latin letters in new documents, but you also should not transform these characters without warning when processing an existing document as you may lose important meaning. |
Both had always bothered me deeply, but I'd never stopped to think that they're also essentially opposed in philosophy to each other. So now that I'm aware of that, I'm triply annoyed :S