|
|
|
|
|
by stormbrew
4662 days ago
|
|
Wish I'd known about this when I was pointing out in another HN thread how utf-16 is a terrible encoding for, among other reasons, pushing the corner case where you find out your encoding/decoding is broken to the very edge of likelihood. It's ridiculous that v8 doesn't properly support utf16, but it's to be expected I suppose. UTF-8 does not have this problem. That's the way we should be moving. |
|
JS's treatment of strings is even more wacky than you might think -- it is neither really UCS-2 or UTF16. Engines are semi-required to use UTF-16 representations of strings internally, but the API surface that is exposed to the JS code makes them look like UCS-2 strings (i.e. no surrogate pairs). However, if you stick a JS string into something that is UTF-16 aware, such as a DOM node, then the surrogate pairs will display correctly.
See [1] for a very clear explanation of this muddy subject.
[0] http://www.ecma-international.org/ecma-262/5.1/#sec-8.4
[1] http://mathiasbynens.be/notes/javascript-encoding