Hacker News new | ask | show | jobs
by pjscott 4949 days ago
Be careful not to confuse UTF-16 and UCS-2. UTF-16 is a variable-width encoding, so using it doesn't actually make counting anything easier. UCS-2 is fixed-width, and evil. With UCS-2 you can easily count the number of code points, as long as they fall in the BMP (!), but this is not the same as counting graphemes.