Hacker News new | ask | show | jobs
by eridius 3054 days ago
<nitpick>The Unicode standard does not have a single definition for "character" because there's multiple interpretations. One reasonable interpretation is "a grapheme cluster".</nitpick>

More specifically, here's what the Unicode Consortium glossary defines for "Character":

> Character. (1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape, rather than a specific shape (see also glyph), though in code tables some form of visual representation is essential for the reader’s understanding. (2) Synonym for abstract character. (3) The basic unit of encoding for the Unicode character encoding. (4) The English name for the ideographic written elements of Chinese origin. [See ideograph (2).]

1 comments

I don't see which of those 4 definitions supports the grapheme cluster interpretation.
The very first one. é has semantic value. ´ by itself doesn't.
Of course it does, because it can be combined with other characters. This is the semantic meaning: https://en.wikipedia.org/wiki/Acute_accent
An accent mark by itself has zero semantic meaning in a written context. It's a modifier. But you need to know what it's modifying in order to assign it any sort of meaning. We're talking about semantic meaning within the context of a written language, not technical details.