Hacker News new | ask | show | jobs
by cursork 4580 days ago
Perl is actually very good with Unicode. Note that a character is "The smallest component of written language that has semantic value" according to the Unicode glossary - I'd say Perl respects that meaning. As noted in the docs, graphemes can be handled with \X in regular expressions (although admittedly that's not pretty):

    my $length = 0; $length++ while $dec =~ /\X/g;
Note that a grapheme is defined as "A minimally distinctive unit of writing in the context of a particular writing system" - i.e. context is required to determine what a grapheme actually is. A few others have pointed that out... Given the definitions from Unicode, Perl does a pretty good job (esp. when using Unicode::Normalize to normalize input).

http://www.unicode.org/glossary/