| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ibsulon 5920 days ago

I just don't understand this.

A file is very very very simple to convert once. Tell your developers, "If you don't encode in UTF-16, you will have a performance penalty. Set your file encodings as UTF-16 too. You weren't doing complex internationalization work before, it's really not that big a deal."

I worked with someone who had been on the ICU project, and he argued that UTF-16 is the best compromise for most cases. If you're working primarily in the western character set, UTF-8 is attractive, but that comes at the expense of others.

And frankly, if you don't roll it yourself, what are you going to use other than ICU?

2 comments

drm237 5920 days ago

I would guess file encoding is not at all the limiting factor since often times opcode caches mean that the file is only read once anyway. The problem is that you get input for all sorts of other areas like forms, databases, web services, etc. most of which aren't UTF-16.

link

wingo 5920 days ago

> And frankly, if you don't roll it yourself, what are you going to use other than ICU?

GNU libunistring: http://www.gnu.org/software/libunistring/

link

ibsulon 5919 days ago

For PHP at least, there is a license match issue.

link