Getting strings to have the right encodings should be easy. On the last Perl codebase I touched it's proven impossible for all practical intents and purposes.
Thanks for posting the happily ignorant code snippet that I have been waiting for.
The problem is that Perl internally encodes strings as sequences of numbers. Not even sequences of bytes, but sequences of numbers that could either be codepoints or bytes resulting from the encoding of such a sequence of codepoints. ...as a developer you are perfectly free to make this assumption any way you please at any given point in your codebase. It's not even clear that any one of those two is particularly "preferred" at large or a best practice or anything like that.
To make things worse, there is no way to know which is which, i.e. a string itself is happily ignorant about the assumptions that people will/should make about it. And Perl will happily concatenate strings making different kinds of assumptions, or double- or triple-encode them as you please, or decode something that hasn't been encoded in the first place.
This leads to jumbles of numbers that aren't anything in particular. They simply work well enough for sloppy programmers to not realize when they are making mistakes, but badly enough to almost guarantee that encoding errors will crop up on users' screens regularly.
Now, given that this is how the language works, be my guest jumping into a 100k loc Perl codebase that dozens of programmers have touched over a decade, passing around and munging together strings not just within their own codebase, but also using strings stored to and retrieved from elsewhere, in some case places where no one knows anymore where they initially came from or where they will ultimately go to.
> Thanks for posting the happily ignorant code snippet that I have been waiting for.
Thank you from being so civil. IMO displaying a badly encoded string beats crashing on a runtime error most of the time. I'd rather see "hôpital" than "Error 500", if you will. Maybe don't think your personal assumptions carry any validity out of your own choices, preferences, or uses.
I imagine the difficulty working with a huge codebase lacking refactoring and maybe even predating utf-8, but where would you be if it was written in Python 2.5 originally?
But that's precisely the point: Python 2.5 realized that something was fundamentally broken and the community went through a painful transition process. Transitioning to Python 3 meant getting your house in order where string encodings were concerned.
Any python programmer would tell you: Starting a new project in 2022 in Python 2.5 is professional malpractice.
But that's what the original post seems to be saying: That Perl 5 has somehow managed to fix any of what was fundamentally wrong with it. ...and that couldn't be further from the truth. And people in this thread are saying that maybe they should have another look into Perl 5 as a serious option for starting out a new codebase in 2022. ...and that's a very bad idea.
Sure: If you started out a new codebase in Perl 5 in 2022, there are coding standards you could adopt to avoid getting yourself into a pickle where string encodings are concerned. But without the interpreter helping you out on that front, it'll produce ugly code, and take mental discipline and disciplined code reviewing practices on a team. It's solving a problem that Python solves for you so much more easily and effectively. You could go with Perl 6 / Raku, but why would you? What does it have to recommend it over Python or Ruby, other than a Perl programmer's nostalgia for being a little Perl-like?
You could say the transition from Perl 5 to Perl 6 is just like the transition from Python 2 to Python 3. The difference is: Perl is simply late by at least a decade.
The point that the article is trying to refute, namely that Perl is for dinosaurs, in my mind just absolutely stands.
> But that's precisely the point: Python 2.5 realized that something was fundamentally broken and the community went through a painful transition process.
It's still not done for many, many projects. "python2" is still installed on 99% of systems I touch. Off the top of my head, the only machine missing it is my laptop, actually.
So your last perl codebase had the choice of not going through this painful transition process. Maybe it's what makes the best sense from a business point-of-view? "Worse is Better", remember, that's why we're running this sloppy Linux everywhere and not the almost-perfect OpenVMS or Genera.
>But without the interpreter helping you out on that front, it'll produce ugly code, and take mental discipline and disciplined code reviewing practices on a team.
"Ugly code" is in the eye of the beholder. For many a python project I'll have to add long series of elif or try/catch to check for the type of some incoming data returned from some API, where it could have been easily managed through duck typing in python 2.7 or Perl. Most programmers won't catch it until the error occurs in production, if the value returned switches types infrequently enough.
> The point that the article is trying to refute, namely that Perl is for dinosaurs, in my mind just absolutely stands.
To me, having to get my "house in order" without asking for it is basically non-consensual changes by the developer, and I think that is abusive.
With humans, my response to abuse is distancing myself from the source of the abuse, and with technologies it is no different.
Python is very popular, and I'm actually learning it right now, but I would not write anything important in it for another 20 years, since they just did ANOTHER breaking change in a minor version release.
> To me, having to get my "house in order" without asking for it is basically non-consensual changes by the developer, and I think that is abusive.
If you used Python 2 in a way where encoding errors wouldn't be an issue, and if you used the Python 2 to 3 conversion tool that they made when they released Python 3, then in most cases you didn't have to do any work beyond that. ...if however you used the tool and suddenly your code threw exceptions then, more often than not, the errors were errors you needed to fix anyway, even if converting to Python 3 wasn't something you wanted to do. And they gave you 11 years for making the transition between the release of Python 3.0 and the last release of Python 2.7.
I don't agree with that analogy about an abusive relationship.
It's more like a five-star restaurant asking you to please put on a shirt if you show up there in a bathing suit. It's just a norm, in this case cultural, that comes with five-star restaurants. If you don't want to follow it, you're free to go find a beach cafe somewhere.
The debate between weak typing and strong typing is as old as the hills. But in much of the modern era, strong typing, of which Python is an example, seems to have decidedly prevailed.