Hacker News new | ask | show | jobs
by shellac 1544 days ago
Yes, that won't change anything internal. However:

> ...JVM's internal string representation is UTF-16

Hasn't been try for a while. They switched to using a byte array internally for storage, plus an encoding. Currently that's either UTF-16 or Latin 1, unless compact strings are disabled in which case it's all UTF-16.

1 comments

You're talking about implementation details of java.lang.String. The interface it exposes is still UTF-16.

Latin 1 has the special property that each of its fixed-width code units maps onto a single UTF-16 code unit. It is for that reason alone that CharSequence implementors can use it as an alternative to UTF-16. Imagine trying to implement `char charAt(int index)` if you're backed by a UTF-8 byte array (or UTF-32, for that matter)!

From a programmer's perspective, Java is pretty much as UTF-16 as ever.