Hacker News new | ask | show | jobs
by ghusbands 2809 days ago
Java's string handling is also broken by default in a few ways, due to it historically using UCS-2 internally and hence still allowing surrogate pairs to get split up, giving broken unicode strings.
1 comments

I have not personally encountered this problem but it's definitely there. The other problem historically is that Java didn't explicitly require clients to specify encodings explicitly when moving between strings and bytes. That's been cleaned up quite a bit in recent releases of the JDK.

All things considered Java character handling was an enormous improvement over the languages that preceded it and still better than implementations in many other languages. (I wish the same could be said of date handling.)