|
|
|
|
|
by pbiggar
4959 days ago
|
|
A couple of reasons why it makes sense for V8 and other vendors to use UCS2: - The spec says UCS2 or UTF16. Those are the only options. - UCS2 allows random access to characters, UTF-16 does not. - Remember how the JS engines were fighting for speed on arbitrary benchmarks, and nobody cared about anything else for 5 years? UCS2 helps string benchmarks be fast! - Changing from UCS2 to UTF-16 might "break the web", something browser vendors hate (and so do web developers) - Java was UCS2. Then Java 5 changed to UTF-16. Why didn't JS change to UTF-16? Because a Java VM only has to run one program at once! In JS, you can't specify a version, an encoding, and one engine has to run everything on the web. No migration path to other encodings! |
|
I'm not sure if that's really true. On IBM's site, they define 3 levels of UCS-2, only one of which excludes "combining characters" (really code points).
http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%...
If you have combining characters, then you can't simply take the number of bytes and divide by 2 to get the number of letters. If you don't have combining characters, then you have something which isn't terribly useful except for European languages (I think?)
Maybe someone more familiar with the implementation can describe which path they actually went down for this... given what I've heard so far, I'm not optimistic.