|
|
|
|
|
by ptx
4171 days ago
|
|
From a quick skimming through the docs[0], Perl 5.8's Unicode support sounds a lot like Python 2's except that the default encoding is latin1 instead of ASCII - i.e. unlike the Python 3 way of explicitly decoding binary data into Unicode text at the point where it's read (where presumably the encoding of the data is known), it will defer the decoding until the data is used (at some completely unrelated point in the program) and decode it with some assumed globally agreed upon encoding. Since there is no globally agreed encoding (Windows even has different legacy encodings between Win32 and the command prompt!) this will appear to work as long as the data is ASCII but later, when you least expect it (and someone inputs a non-ASCII string), give UnicodeDecodeError in Python 2 or garbage data in Perl 5.8. Ned Batchelder gave an excellent talk[1] that explains how the Python 3 approach to Unicode works. I think it makes a lot more sense once you understand it; the Python 2 way was clearly broken, and it looks like Perl 5 has the same problem but hides it better. [0] http://search.cpan.org/~jhi/perl-5.8.0/pod/perluniintro.pod [1] http://m.youtube.com/watch?v=sgHbC6udIqc |
|
Also according to perlunifaq the minumum version you should be using is 5.8.1 which the documents for 5.8.0 would of course not mention.
Really if you want good unicode handling you should probably use 5.16.0 or later. If you want the latest version of Unicode there are ways of changing which version of Unicode Perl is compiled with, but it is easier to just use the latest version of Perl, which is 5.20.
http://perldoc.perl.org/perluniintro.html http://perldoc.perl.org/perlunitut.html http://perldoc.perl.org/perlunifaq.html http://perldoc.perl.org/perlunicode.html
p.s. I noticed in the Python talk you linked that no one knew that the pile of poo symbol is in there because the japanese characters for luck and poo are very similar. ( I am unable to find a link to where I first read this ) The Japanese are also responsible for why we call them emoji (e means image, and moji means character. ) http://www.fastcompany.com/3037803/the-oral-history-of-the-p...