Hacker News new | ask | show | jobs
by conaclos 989 days ago
Thanks for sharing!

This is the first time I saw a mention of WTF-8 [1] and WTF-16. From the spec description this seems strange to use this interoperability "hack" (using the word from the spec) as the foundation of the string proposal. I wonder if they could use UTF-8 instead and keep WTF-16 for interoperability with JavaScript.

> WTF-8 [...] is a superset of UTF-8 that encodes surrogate code points if they are not in a pair. It represents, in a way compatible with UTF-8, text from systems such as JavaScript and Windows that use UTF-16 internally but don’t enforce the well-formedness invariant that surrogates must be paired. WTF-8 is a hack intended to be used internally in self-contained systems with components that need to support potentially ill-formed UTF-16 for legacy reasons.

[1] https://simonsapin.github.io/wtf-8/