|
Kind of, sort of, not really. What they imply (by using the term "ASCII" here) is not correct, and I'm not sure how the assurance that the string does not contain astral characters helps them split a string by the `.` character. But JavaScript doesn't exactly "smooth over this" in a very useful way, either. For legacy reasons, JavaScript's "character unit", the basic component of a string, is an "UTF-16 character", that is, sixteen bits that are interpreted as being UTF-16-encoded. That said, sixteen bits are not enough to represent all valid Unicode characters in the UTF-16 encoding. Instead, characters in the [supplemental planes] are represented in UTF-16 using two sixteen-bytes "non-characters", which do not individually map to any Unicode codepoint in any plane, but in combination reference an Unicode codepoint in one of the supplemental planes. JavaScript's internal representation of strings, as well as the APIs it exposes for dealing with strings, such as index accessing and string length, treat each of the sixteen bit "halves" of the UTF-16 representation of a supplemental plane codepoint as individual characters. This means that, when you index a string, you might get an UTF-16 character that represents a Unicode codepoint in the basic plane, or an UTF-16 "non-character" that, along with its other half, would represent an Unicode codepoint in one of the supplemental planes. [supplemental planes]: https://en.wikipedia.org/wiki/Plane_(Unicode) (see planes 1 to 16) |
That's great feedback! After reading your comment and re-reading the section in the article it does indeed sound wrong. Decided to remove that paragraph. Your explanation of the string representation is really good. Thanks for sharing!