| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by WorldMaker 1100 days ago
	UTF-8's original specification included 5-byte and 6-byte encodings to cover the complete astral plane (31-bit code points), but later specifications have marked those "invalid" today due to the current 21-bit limit of UTF-16 and to align both specifications for now rather than fix the bugs in UTF-16 (or scratch UTF-16 altogether). In theory, UTF-8 can even extend beyond 6-byte encodings (and UTF-32 into 8-byte encodings and beyond) if the next plane (63-bit code points) or the one after that ever needed to open up. (No one expects that any time soon, of course. Today Unicode is nowhere close to in danger of filling 21-bits much less 31. That would be a massive shock and the compatibility headache would be terrible with UTF-16 breaking or today's software breaking that hard codes the assumption that UTF-8 should never go past 4-byte encodings.)