Hacker News new | ask | show | jobs
by josefx 2769 days ago
For most strings I don't care about language, encoding or related overhead. In my scripts they are best dealt with as ophaque bytes with a few specific byte patterns that are the same in ascii and utf8, as well as various other encodings.

Last unicode issue I had was on a system german characters, because some library assumed it had to explicitly perform encoding with a bad default setting. If the library didn't try to be smart the program would have worked independently of system or language, instead it failed on any non english system by trying to convert a perfectly fine, system specific encoding to utf8.