Hacker News new | ask | show | jobs
by dgreensp 4948 days ago
The specific problems the author describes don't seem to be present today; perhaps they were fixed. That's not to say this conversions aren't a source of issues, just that I don't see any show-stopper problems currently in Node, V8, or JavaScript.

In JavaScript, a string is a series of UTF-16 code units, so the smiley face is written '\ud83d\ude04'. This string has length 2, not 1, and behaves like a length-2 string as far as regexes, etc., which is too bad. But even though you don't get the character-counting APIs you might want, the JavaScript engine knows this is a surrogate pair and represents a single code point (character). (It just doesn't do much with this knowledge.)

You can assign '\ud83d\ude04' to document.body.innerHTML in modern Chrome, Firefox, or Safari. In Safari you get a nice Emoji; in stock Chrome and Firefox, you don't, but the empty space is selectable and even copy-and-pastable as a smiley! So the character is actually there, it just doesn't render as a smiley.

The bug that may have been present in V8 or Node is: what happens if you take this length-2 string and write it to a UTF8 buffer, does it get translated correctly? Today, it does.

What if you put the smiley directly into a string literal in JS source code, not \u-escaped? Does that work? Yes, in Chrome, Firefox, and Safari.

1 comments

The invisible smiley was a font system problem, fixed in Firefox 19 Aurora (assuming you're on Mac).

https://bugzilla.mozilla.org/show_bug.cgi?id=715798