|
|
|
|
|
by georgemandis
34 days ago
|
|
I'm realizing `encodeURIComponent` is actually part of the ECMA spec! I thought it was something provided by the browser like `window` or `document`. I withdraw my "the language handled it fine" comment, haha. Before I'd looked that up I was going to say: I feel like "don't allow an invalid Unicode string to exist all" feels like a separate/bigger problem to me from "handling it fine" when they do get created. To the extent I can hand JavaScript an invalid combination of code units in a variety of other scenarios, returning a � felt fine. e.g.
// valid
String.fromCodePoint(0xd83e, 0xdd20)
// invalid, but "�" is ... fine?
String.fromCodePoint(0xdd20, 0xd83e) |
|
JavaScript’s problems start with not having separate code unit or scalar value types. Sequences of UTF-16 code units, individual UTF-16 code units and scalar values all use the type string. (Code unit and scalar value also both use number in some contexts.)
The first step to fixing JavaScript’s bad semantics would be separating the code unit and scalar value types. If you did that… the changes required to support strict strings are perhaps surprisingly small. Even migrating to UTF-8 semantics is not very hard then.
Unfortunately, JavaScript seems very determined to do stupid things and allow stupid things and then do more stupid things with the stupid things it foolishly allowed.