|
|
|
|
|
by acdha
4659 days ago
|
|
You'd have to have rather weird data for it to be anywhere near 50% larger for real text (i.e. even if you only use Chinese, if you have punctuation, arabic numerals, quotes or URLs, HTML, etc. the averages cancel more than you might think) and a completely incompetent search engine design for that to remotely approach 50% more time to query or index. If you were assigned the task of indexing the UTF-8 worst case corpus, nothing would stop you from designing a custom internal encoding while enjoying the many technical advantages UTF-8 gives you in every other area. Y internal details like compression are much easier to change than dealing with external interfaces which must be coordinated (this is why JavaScript still has such painful Unicode support even though browsers handle almost everything well in markup) |
|
That's exactly how those UTF-X and UCS-Y encodings were invented, right?
The point is, this beast is called unicode, how ironic.