Hacker News new | ask | show | jobs
by chuckadams 31 days ago
> surrogates, regardless of whether they’re paired, are invalid in UTF-8

Java did not get the memo. Since the char type is fixed at 16 bits, it uses surrogates to encode everything outside the BMP, regardless of the encoding.

1 comments

If you use the string methods that work with code points instead of chars, you rarely if ever have to deal with surrogate pairs in Java.