Hacker News new | ask | show | jobs
by masklinn 5401 days ago
> High/Low surrogates

Surrogates are not codepoints.

2 comments

That's not right. Surrogates are code points. That's the whole idea! It means you can express characters beyond the BMP with legacy encodings that were designed back when the entire code space could be coded in 16 bits. Newer encodings like UTF-8 don't have to rely on surrogates to do this because they have that capability at the bytewise encoding level.

http://www.google.com/search?&q=site%3Aunicode.org+surro...

Each of the pair is a single codepoint; both combine to make one character.