|
|
|
|
|
by the_mitsuhiko
1303 days ago
|
|
> I am having a hard time to understand how so many people prefer this as the outcome of a Web / polyglot / language neutral standard, especially since affected languages cannot change, and the problem is so trivially avoided, say with a boolean flag "this end is WTF-16, it's OK if the other end is as well" (otherwise use well-formed/UTF-8 semantics). It's because the idea that languages "cannot change" does not appear to be true. UTF-8 is so widespread now that for languages changing the native string representation towards it has become an interesting proposition. Many modern languages (eg: Go and Rust) already picked UTF-8, others such as Swift changed over to it. Then there are implementations of languages like Python (PyPy) that changed their internal encoding even though that was a widespread assumption that it cannot work. The web is also not WTF-16, JavaScript is and the web consists of more than just that. WTF-16 to WTF-16 is most likely becoming less and less a thing going forward except for legacy interfaces such as W APIs on Windows and even there it appears that UTF-8 on the codepage level is now strongly recommended. To give you another example: I'm very interested in using AssemblyScript today to do data processing, but that actually is not all that easy because the data I need to process is in UTF-8. Now to use the string class in AssemblyScript I actually have to do a pointless data conversion to WTF-16 and back. I would be majorly surprised if JavaScript doesn’t adopt UTF-8 at one point as well. |
|