Hacker News new | ask | show | jobs
by acdha 4659 days ago
I'm defining “exchange” as not just the outside world but also among components in your own architecture. I've flushed out enough problems with the UCS-2 / UTF-16 breakage that I've become somewhat sold on the unambiguous nature of UTF-8.

For the classic double-byte languages, what's your total data size after compression? e.g. in the case of full-text search, enabling compression has been enough of a win that the 2/3-byte expansion hasn't been a challenge, particularly since the biggest UTF-8 drawback (inability to predict total byte string length) isn't an issue when working with a data structure which records the length of each record.