Hacker News new | ask | show | jobs
by HelloNurse 3296 days ago
Graphemes? Strings can be expected to be transformed so that they contain the "right" type of characters (in particular, plain letters followed by the respective combining diacritical marks vs a smaller number of precomposed accented letters) before asking about their length, which is how many characters they contain. If you want to count "graphemes" you can have a different function and possibly a separate cached result.

This in theory. It is readily apparent from the Data.Text page at https://hackage.haskell.org/package/text-1.2.2.2/docs/Data-T... that the authors care about performance only selectively ("fusion" of buffer allocations is fun, relatively messy data structures to cache important data such as string length are not fun), and that they don't care enough about Unicode to add serious string-level abstractions over the Unicode tables exposed in the Char type.