Hacker News new | ask | show | jobs
by shawnz 4659 days ago
> Unfortunately for us, Javascript has never been updated to support UTF-16. Instead it continues to treat strings as UCS-2.

So really, they were parsing the JSON as if it were UTF-16, but really it was UCS-2. How is that an error in Node?

2 comments

JSON is defined as UTF8, 16 or 32 [1]. The escaped characters are UTF-16 not UCS2. It is unfortunate of JavaScript can't parse it correctly!

[1] http://www.ietf.org/rfc/rfc4627.txt

This is true of JSON, but its not true of Javascript which gives no fucks about utf16 (or valid surrogate pairs). Its a very strange world where JSON and Javascript have incompatible interpretations of strings.

http://mathiasbynens.be/notes/javascript-encoding

Not really as JSON is not valid JavaScript and requires its own parser. It's based on JavaScript, but it is not JavaScript.
I was skeptical, but I did some searching, and you appear to be right! The difference seems to come down to string handling:

http://timelessrepo.com/json-isnt-a-javascript-subset

Ha, same article where I first learned this.
They wanted to parse some bytes as utf-16, but are unable to do so because V8 only understands ucs2 (with invalid surrogate pairs). This is a major problem with node- ie, it happily produces/consumes invalid unicode encoded strings.