|
|
|
|
|
by njuw
775 days ago
|
|
They're UTF-16, and substr(), length, etc, work at the code unit level. Hence, the above isn't actually valid for all strings - any characters that are represented by codepoints between U+10000 and U+10FFFF require 2 code units [1]. For example U+10429 Deseret Small Letter Long E [2] > '𐐩'.substr(0, 1)
'\ud801'
> '𐐩'.length
2
[1] https://en.wikipedia.org/wiki/UTF-16#Description[2] https://codepoints.net/U+10429 |
|