|
|
|
|
|
by dcode
1304 days ago
|
|
That's not the issue / impression. The issue is that well-formed UTF-16 is rarely used in practice. All of JavaScript, Java, C#, Dart, Kotlin etc. effectively use WTF-16 for compatibility and performance reasons, and that's what's semantically distinct from UTF-8. These have asymmetric value spaces, so that strings in "legacy" languages would sometimes throw or implicitly mutate synchronously. Mixed systems typically use WTF-8 as the common denominator for this reason, i.e. not UTF-8, but Wasm decided against. |
|
Even Python which adopted the pretty ridiculous internal UCS4 encoding is now carrying around a utf-8 pre-encoded version of strings for crossing boundaries. UTF-8 is just too widespread that many languages can avoid having to support it natively in some form.
Likewise WASM is not the first standard that has opinions about string encodings that are not native to a language. For instance Go and Rust which prefer to use UTF-8 internally have to re-encode on the way to Windows APIs (usually!). Likewise Cocoa/Objective-C traditionally use UCS-2 strings which are quite leaky, yet Swift nowadays uses UTF-8 internally and transcodes.