Hacker News new | ask | show | jobs
by peteretep 4946 days ago
Good to see so much active work on Unicode support. Really feels like Perl is coming in to its own as the language to use with Unicode text...
2 comments

Unless you've been living under a rock Perl has been the goto language for Unicode support for half a decade now.
Indeed, the "Unicode support" in question is tracking changes to the Unicode standard itself. Perl 5 has long been one of the top-flight Unicode implementations (and not just implementation, but a community-wide set of best practices, documentation for use, etc.).
I must say, when Perl is the "top flight" implementation for unicode i don't want to be a programmer anymore..

http://stackoverflow.com/questions/6162484/why-does-modern-p... Just scroll to the end of the first comment, boilerplate code.. jesus....

This is very thorough boilerplate code for dealing with all corner cases when using utf-8 data with Perl.

Instead of dismissing or dissing Tom Christiansen excellent post I would highly recommend reading into his The Good, the Bad, & the (mostly) Ugly presentation from OSCON 2011 [1] where he compares Unicode handling across mainstream languages and then see how this code (and Perl) shapes up in comparison.

In the meantime pragmatic Perl programmers can cover most of that utf-8 boilerplate with just:

  use 5.016;
  use warnings;
  use utf8::all;
Or if you're like me and use perl5i [2] then its just:

  use perl5i::2;

[1]: http://training.perl.com/OSCON2011/index.html

[2]: https://metacpan.org/module/perl5i

Perl 5 was released in October 1994, so it's impressive in its own right that a) there is boilerplate you can add at all to get good Unicode support and b) that you can extend the language to support it using just boilerplate.

As the other comment mentioned, improvements by "default" to Unicode support do get included into later Perl 5 releases, but you have to let the compiler/interpreter know that you're buying into that so that it can reduce the boilerplate for you.

Is that really laudable in this day and age?

I mean Java had Unicode support right out the door in 1995. Not that it's a wonderful text processing language, but you'd think 17 years is plenty of time to catch up...

Supporting Unicode circa 1995 was good, but supporting an outdated version of Unicode incompletely isn't great. Per Tom Christiansen, JDK7 looks like the minimum required Java version to do modern Unicode correctly.

http://training.perl.com/OSCON2011/index.html