Hacker News new | ask | show | jobs
by chrismorgan 1616 days ago
Technical amendment: UTF-16 can represent the full range of Unicode scalar values with surrogate pairs. Code points includes the surrogates U+D800–U+DFFF, scalar values don’t. Like all other Unicode encodings, UTF-16 cannot represent surrogates.

That’s where the real problem lies: almost nothing that uses UTF-16 actually uses UTF-16, but rather potentially ill-formed UTF-16.