Hacker News new | ask | show | jobs
by jekub 4003 days ago
But "temporary" is a thing who exists only in theory. In practice its always never or (almost) forever. As soon as a few applications start using this "new" form of UTF-8, some of them may have to keep supporting it forever.

Why not directly going for the pre-2003 UTF-8 encoding ? It would even put a bit of pressure for restoring them and would show that this is the good way. It is also the only way I think to convince people to start implementing it.

1 comments

> As soon as a few applications start using this "new" form of UTF-8, some of them may have to keep supporting it forever

Not if it's used through a 3rd-party library such as the Go-implementation of UTF-88 I've provided.

> Why not directly going for the pre-2003 UTF-8 encoding ? It would even put a bit of pressure for restoring them

Because it's not a valid encoding under the current scheme, whereas using surrogates with UTF-8 is, using as it does the 2 private use planes to implement the surrogates. The goal is for restoration by the Unicode Consortium, but based on their public utterances it's not going to happen easily or quickly, and in the meantime we need an encoding that's valid under the current scheme because it may need to be used for 10 or 20 years. Of course I could have used UTF-16 with a doubly-directed surrogate system but that would be even more error-prone, and I expect whatever 2nd-level surrogate system is eventually provided with UTF-16 will be legally available with UTF-8 and UTF-16 anyway.

UTF-88 is an attempt to showcase both a surrogation scheme implementable in current UTF-16 and the fact that UTF-8 is the best encoding.