Hacker News new | ask | show | jobs
by singingfish 1378 days ago
One of my criteria for good tools is that they scale well from the smallest possible use case to absolutely massive. Git meets this criterion for example.

Anyway I manage a 250kline code base written over 20 years which is in surprisingly good shape consider it's age and how many people have touched it. Last time we upgraded the perl for the first time in a decade - going through the addition of many features and major internal changes (e.g. unicode, optimisation) the total number of lines of code we needed to change was at most 50. And very little having to fiddle with underlying cpan libraries.

Back to the point. Throwaway script - perfect candidate. Code capable of running the money pump for a billion dollar company. Also just as fine as any other similarly capable environment, better than some, trickier to manage the team than others.

1 comments

"Perl makes easy things easy and hard things possible."
Getting strings to have the right encodings should be easy. On the last Perl codebase I touched it's proven impossible for all practical intents and purposes.
It's markedly easier than with Python, though. Here's a short script that will recode a file with mixed iso-8859-1 and utf8 data into proper utf8:

    #!/usr/bin/perl
    use strict;
    use warnings;
    
    use Encode qw( decode FB_QUIET );
    
    binmode STDIN, ':bytes';
    binmode STDOUT, ':encoding(UTF-8)';
    
    my $out;
    
    while ( <> ) {
        $out = '';
        while ( length ) {
            $out .= decode( "utf-8", $_, FB_QUIET );
            $out .= decode( "iso-8859-1", substr( $_, 0, 1 ), FB_QUIET ) if length;
        }
        print $out;
    }
Thanks for posting the happily ignorant code snippet that I have been waiting for.

The problem is that Perl internally encodes strings as sequences of numbers. Not even sequences of bytes, but sequences of numbers that could either be codepoints or bytes resulting from the encoding of such a sequence of codepoints. ...as a developer you are perfectly free to make this assumption any way you please at any given point in your codebase. It's not even clear that any one of those two is particularly "preferred" at large or a best practice or anything like that.

To make things worse, there is no way to know which is which, i.e. a string itself is happily ignorant about the assumptions that people will/should make about it. And Perl will happily concatenate strings making different kinds of assumptions, or double- or triple-encode them as you please, or decode something that hasn't been encoded in the first place.

This leads to jumbles of numbers that aren't anything in particular. They simply work well enough for sloppy programmers to not realize when they are making mistakes, but badly enough to almost guarantee that encoding errors will crop up on users' screens regularly.

Now, given that this is how the language works, be my guest jumping into a 100k loc Perl codebase that dozens of programmers have touched over a decade, passing around and munging together strings not just within their own codebase, but also using strings stored to and retrieved from elsewhere, in some case places where no one knows anymore where they initially came from or where they will ultimately go to.

> Thanks for posting the happily ignorant code snippet that I have been waiting for.

Thank you from being so civil. IMO displaying a badly encoded string beats crashing on a runtime error most of the time. I'd rather see "hôpital" than "Error 500", if you will. Maybe don't think your personal assumptions carry any validity out of your own choices, preferences, or uses.

I imagine the difficulty working with a huge codebase lacking refactoring and maybe even predating utf-8, but where would you be if it was written in Python 2.5 originally?

But that's precisely the point: Python 2.5 realized that something was fundamentally broken and the community went through a painful transition process. Transitioning to Python 3 meant getting your house in order where string encodings were concerned.

Any python programmer would tell you: Starting a new project in 2022 in Python 2.5 is professional malpractice.

But that's what the original post seems to be saying: That Perl 5 has somehow managed to fix any of what was fundamentally wrong with it. ...and that couldn't be further from the truth. And people in this thread are saying that maybe they should have another look into Perl 5 as a serious option for starting out a new codebase in 2022. ...and that's a very bad idea.

Sure: If you started out a new codebase in Perl 5 in 2022, there are coding standards you could adopt to avoid getting yourself into a pickle where string encodings are concerned. But without the interpreter helping you out on that front, it'll produce ugly code, and take mental discipline and disciplined code reviewing practices on a team. It's solving a problem that Python solves for you so much more easily and effectively. You could go with Perl 6 / Raku, but why would you? What does it have to recommend it over Python or Ruby, other than a Perl programmer's nostalgia for being a little Perl-like?

You could say the transition from Perl 5 to Perl 6 is just like the transition from Python 2 to Python 3. The difference is: Perl is simply late by at least a decade.

The point that the article is trying to refute, namely that Perl is for dinosaurs, in my mind just absolutely stands.

> I'd rather see "hôpital" than "Error 500"

The debate between weak typing and strong typing is as old as the hills. But in much of the modern era, strong typing, of which Python is an example, seems to have decidedly prevailed.

What we need from a programming language is to make medium complexity things, at worst, medium difficulty.

I don’t care about hard problems, and easy problems.

Erlang/OTP does medium difficulty things, i.e. very large applications with good fault tolerance and QoS, really well.

But it's a very different niche. Perl and Ruby scale to mid sized applications quite well, but above that fault tolerance and QoS become hard.

I think with languages like Perl, Ruby and Python you just need a static, compiled language to migrate to at a certain scale, preferably with similar features. Kotlin and Scala seem to be currently the best options for Ruby, Python and OO Perl. For procedural Perl maybe Golang.
Rust gets large amounts of inspiration from perl so don't forget that one.
Wikipedia quotes Ruby as an influence, but not Perl, citing this source: https://doc.rust-lang.org/reference/influences.html

Still, Perl might have influenced Rust via Ruby. Otherwise, do you have a reference for Perl's direct influence on Rust?

Python has a number of similar, static, compiled languages that are embeddable in Python code (notably Cython and taichi).