|
|
|
|
|
by leshow
3296 days ago
|
|
It's hard to say what length should be because it could mean different things. How many u8 values there are? How many graphemes are present? If it's just counting how many u8's in a slice, that's trivial. But storing the total number of variable-length graphemes isn't. I assume that's why Data.Text's length function is O(n), and other languages that have UTF-16 or UTF-8 strings also have length functions that are linear. |
|
This in theory. It is readily apparent from the Data.Text page at https://hackage.haskell.org/package/text-1.2.2.2/docs/Data-T... that the authors care about performance only selectively ("fusion" of buffer allocations is fun, relatively messy data structures to cache important data such as string length are not fun), and that they don't care enough about Unicode to add serious string-level abstractions over the Unicode tables exposed in the Char type.