Hacker News new | ask | show | jobs
by Someone 30 days ago
>> Unicode code points are 32 bit

> 21-bit, actually

Less than that. https://en.wikipedia.org/wiki/Code_point#In_character_encodi...:

“The Unicode code space is divided into seventeen planes (the basic multilingual plane, and 16 supplementary planes), each with 65,536 (= 2¹⁶) code points. Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112”

That makes it log(1,114,112)/log(2) bit. That’s about 20,09.

(https://www.unicode.org/versions/Unicode17.0.0/ assigns 159,801 of them to characters)

2 comments

Don't know what you are being down voted (or my grand parent comment for that matter). You are very correct.
Sorry, I was thinking of 0x1FFFFF as the end, but it’s 0x10FFFF. Forgetful.