|
|
|
|
|
by ygra
3731 days ago
|
|
Except that U+FEFF is specified to be only used as byte-order indicator when appearing at the start of a text stream and does not belong to the text content. It does not corrupt data because it's not part of the data for any conforming application. Non-conforming applications doing wrong things because they don't bother to follow the standard is hardly surprising, then. Yes, Unicode is messy and could have been better designed (it was designed so that there is an easy conversion path for any pre-existing encoding – thus concerned itself more with making it easy to convert content in legacy encodings to Unicode, instead of making it easy to implement applications in a way that they support Unicode), but it's still orders of magnitude better than anything that came before it when it comes to representing text in general. And it's mostly complicated because languages and scripts are complicated. |
|