Hacker News new | ask | show | jobs
by flohofwoe 2395 days ago
Well if you write an application for a 'non-technical' international audience, you'll have to support international text output. And representing text as one of the universal Unicode encodings is still much better than the codepage mess and region-specific multi-byte encodings like Shift-JIS we had before.

UTF-8 is usually the best choice both for simple tools and 'user-facing applications' since it is backward-compatible with 7-bit ASCII (e.g. usually you don't need to change a thing in your code, at least if you just pass strings around).

If you encounter a byte in an UTF-8 encoded string which has the topmost bit cleared, it's an ASCII character and definitely not part of a multi-byte sequence. If the topmost bit is set, the byte is part of a multi-byte-sequence, and such sequences must remain intact.

1 comments

UTF-8 isn't such a bad encoding (although it isn't ideal for fix pitch text; I invented a character set and encoding which would be better for fix pitch text). But I was not talking about the encoding; I was talking about the Unicode character set.
> UTF-8 isn't such a bad encoding (although it isn't ideal for fix pitch text; I invented a character set and encoding which would be better for fix pitch text).

This is utterly incoherent.

Can anyone explain how the statement I responded to makes sense?

I must be wrong, getting so many disagreements.

Well, UTF-8 is an encoding of Unicode, which allows for surrogate pairs and all that jazz which can be a bad fit for fixed-pitch text.

For example, take a Zalgo text generator[1] and try to make the result make sense in a fixed-pitch (monospace) setting.

At least that's my interpretation of what he tried to convey.

[1]: http://eeemo.net/