|
|
|
|
|
by est
4660 days ago
|
|
I totally agree that UTF-8 is pretty good at exchanging data because UTF8 is better than UCS-* and other UTF-* overall, and because everyone is (and should be) using it. But you know, there are other cases besides exchanging. Like I said, if your text data is mainly latin you are good, but not so good if you are stuck with non-latin BMPs. |
|
For the classic double-byte languages, what's your total data size after compression? e.g. in the case of full-text search, enabling compression has been enough of a win that the 2/3-byte expansion hasn't been a challenge, particularly since the biggest UTF-8 drawback (inability to predict total byte string length) isn't an issue when working with a data structure which records the length of each record.