Hacker News new | ask | show | jobs
by mbell 3257 days ago
> The key thing to remember is that iteration over Unicode strings only makes sense as iteration over code points, not UCS-2 characters, not bytes, not grapheme clusters.

There are tons of situations where interating over grapheme clusters is what you want to do.

1 comments

And tons of situations where you don't want neither of two (e.g. nfd vs. nfc). Cairo graphics library has utilities for text rendering, explicitly called "toy text" functions in reference, leaving serious rendering to Pango. That's fair. Languages should not call unicode strings "unicode strings" if these are not covered in detail by special libraries with distinct names for ucp/ucs/etc lengths, iterators, etc. There is no such thing as string length or "char" anymore. String is blank or non-blank, anything beyond that is too complex to be part of any stdlib. Even "blank" is not so obvious today.