|
|
|
|
|
by flohofwoe
2395 days ago
|
|
Well if you write an application for a 'non-technical' international audience, you'll have to support international text output. And representing text as one of the universal Unicode encodings is still much better than the codepage mess and region-specific multi-byte encodings like Shift-JIS we had before. UTF-8 is usually the best choice both for simple tools and 'user-facing applications' since it is backward-compatible with 7-bit ASCII (e.g. usually you don't need to change a thing in your code, at least if you just pass strings around). If you encounter a byte in an UTF-8 encoded string which has the topmost bit cleared, it's an ASCII character and definitely not part of a multi-byte sequence. If the topmost bit is set, the byte is part of a multi-byte-sequence, and such sequences must remain intact. |
|