| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rurban 2801 days ago

They are still mostly not multi-byte string (i.e. unicode) aware after decades of work. I.e. you cannot really search for strings, with case-folding or normalized variants.

See http://crashcourse.housegordon.org/coreutils-multibyte-suppo... and http://perl11.org/blog/foldcase.html for an overview of the performance problems.

This tool only does the minor task of validation of the UTF-8 encoding, nothing else. There are still the major tasks of decoding, folding and normalization to do.