|
A "code unit" exists in UTF-8 and UTF-32; they are not unique to UTF-16.[1] UTF-8's relationship with code points is approximately the same as UTF-16's, except that UTF-8 systems tend to understand code points better because if they didn't, things break a lot sooner, whereas they mostly work in UTF-16. Your entire argument that graphemes are a poor way to deal with unicode seems to be that current programming languages don't use graphemes, instead dealing in a mix of code units or points. But the article here shows a number of cases where that doesn't break down, and the person you're responding to clearly points out that, for the cases covered in the article, graphemes are the way to go (and he's correct). Graphemes aren't always the correct method (and I don't think your parent was advocating that), just like code units or code points aren't always the right way to count. It's highly dependent on the problem at hand. The bigger issue is that programming languages make the default something that's often wrong, when they probably ought to force the programmer to choose, and so, most code ends up buggy. Worse, some languages, like JavaScript, provide no tooling within their standard library for some of the various common ways of needing to deal with Unicode, such as code points. [1]: http://unicode.org/glossary/#code_unit |
Any time you define an upper limit, someone will come up with more emojis that will require larger number of code points per grapheme.