| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jeberle 330 days ago
	You forgot `- 2^11` for the surrogate pairs. Gee, why isn't Unicode 2^21 code points? To understand the Unicode code point space you must understand UTF-16. The code space is defined by how UTF-16 works. That was my initial point.

1 comments

jcranmer 330 days ago

If you're going to count the surrogate pairs as not-a-Unicode-codepoint, you should also count the other noncharacters: the last two codepoints on each of the 17 planes and the range U+FDD0-U+FDEF.

The expansion of Unicode beyond the BMP was designed to facilitate an upgrade compatibility path from UCS-2 systems, but it is extremely incorrect to somehow equate Unicode with UTF-16.

link

kbolino 330 days ago

FWIW there is an official term for "code points excluding surrogates", it is "Unicode scalar value".

link

jeberle 329 days ago

OK, I'm lost here. Why is there a 1:1 correspondence between the two?

link