| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mryall 4993 days ago

I hate the implication in this comment and in the linked article that the spec is somehow immutable. The ECMAScript spec here is fundamentally flawed with regard to character encoding and needs to be fixed.

UCS-2 is not a valid Unicode encoding any more, because there are several sets of characters encoded outside the BMP. The spec should be updated to require UTF-16 support in all implementations.

If a modern programming language like JavaScript doesn't provide a way to represent characters outside the BMP in its character data type, that needs to be fixed too. Indexing and counting characters in a JavaScript string need to reflect the human and Unicode notion of characters, not the arbitrary 2-byte blocks that UCS-2 happens to use.

The language authors should be ashamed of this situation - having a modern language without proper Unicode support is simply awful.