Hacker News new | ask | show | jobs
by doxcf434 3769 days ago
I recently had to maintain some new perl code. I didn't think it would be a big deal, but found a number of things I take for granted today that perl hasn't kept up with:

1) The perl cpan module doesn't resolve dependencies

2) The cpan module has parsing errors when passing in a list of CPAN packages

3) You have to manually grep your perl code to see what modules it depends on

4) Module installs take a long time since they can compile and unit test the code, unit tests can even make connections to the internet or try to access databases and fail, so you just have for force them to install

5) Non-interactive installs of CPAN modules requires digging in the docs and learning you need to set an env var to enable

6) CPAN modules aren't used that heavily and can have bugs that would be caught in wider used modules. (e.g. the AWS EC2::* modules don't page results from AWS so results sets can be incomplete, whereas the wider used boto lib works correctly and is better maintained.)

7) Perl devs don't think twice about shelling out to an external binary (that may or may not be installed)

8) Even if regexs are not needed, inevitably the perl dev will use them since that's the perl hammer, and it's hard to know what the intention is with regexes or what the source data even looks like

9) You have to manually include the DataDumper package to debug data structs

10) You have to manually enable warnings and strict check, it's not on by default.

Anyhow, I think we've made a lot of progress since the 1990s. :)

6 comments

A few comments:

* It is often recommended to use cpanminus[1] instead of the CPAN.pm module. But it is up to the distribution you try to install to declare it's dependencies correctly. Not doing that is a bug.

* If you use cpanminus you can use the --notest flag to skip tests. But tests are a feature.

* Software have bugs. Reporting them when they are found is how software get less bugs.

* Cpan distributions should not[2] use external binaries (and exceptions should be clearly documented and motivated).

* The ease of use of regexes in Perl is not an argument for not documenting them (and in this case) the document format they are meant to parse.

* There are several different data dumpers. No assumption on the user's preference is made.

* If you use a newer Perl (5.12+) you get strict enabled automatically[3], and also (depending on which version your code requires) some new features. Due to backwards compatibility it is not possible for newer Perls to enable strict or warnings implicitly.

The Perl of today is also vastly improved since the 1990s, hopefully you will come across some modern perl too.

[1] https://metacpan.org/pod/App::cpanminus

[2] https://www.ietf.org/rfc/rfc2119.txt

[3] https://metacpan.org/pod/release/JESSE/perl-5.12.0/pod/perl5...

I think the difference is in other languages I don't have to think about these things any more than I think about what IRQ my sound card is on.

In the CPAN case, if cpanminus is the "good one", then it should be installed by default and CPAN.pm needs to tell you to use that instead or just be deprecated. I don't want 5 choices in package managers, I just want the good one. :)

One factor that sometimes leads to problems in this regard is (as mentioned) backwards compatibility. Pretty much nothing that once has worked can be removed or changed because somewhere mission-critical software depends on it.

Another issue is discoverability. A concrete example is that https://metacpan.org/ is a much better (imho) presentation of cpan than http://search.cpan.org/.

It is the curse of being a very stable language and ecosystem.

> Perl devs don't think twice about shelling out to an external binary

No, most of them do. Perl ecosystem has a killer feature called cpantesters, that allows everyone to see which modules work on which systems out of the box. You should always check cpantesters matrix before choosing a particular dependency.

> Even if regexs are not needed

They got overly complicated over the years, but they are needed. They are DSLs to make things easier when working with strings. I.e. so you wouldn't have to write 20 lines of hard to grasp code with bytes.Index(), bytes.HasSuffix(), bytes.TrimRight(), etc., like people do in Go, but a single nice regexp and therefore reduce your chances to make a mistake in that code.

> so you wouldn't have to write 20 lines of hard to grasp code with bytes.Index(), bytes.HasSuffix(), bytes.TrimRight(), etc., like people do in Go

Go has regexps, and a very good implementation at it.

Depending on what you do and on the specific code-path, compiling and/or executing a regexp might be slower than manually parsing the string. Go standard library is pretty concerned with performance (much more than Python's or Ruby's, for instance), so it tends to avoid regexps.

It shouldn't be like that, that's the problem. Regular expressions should be compiled into a native code and be even faster than a bunch of hand written bytes.HasSuffix() combinations.
Your previous post said that they are a very useful DSL for Perl so that "people don't have to do like they do in Go".

Both Perl and Go implement regexps, and neither or them compile them to native code. So I don't get your previous comment at all.

The main difference is that, in Perl, if you ever had to write manual string parsing, it would be much much slower than using regexps as Perl is an interpreted language. So regexps are needed to perform fast string parsing. In Go, you have regexps if you want, or you can go even faster if you feel it's required.

> Both Perl and Go implement regexps, and neither or them compile them to native code. So I don't get your previous comment at all.

Ok, I'll try to explain.

People feel discouraged to use regexps in Go, because they are very slow for many typical parsing and validating cases and require extra step of compilation and all of the additional code complexity associated with that. So, people do parsing manually instead, with all of its problems. It's not that they need that performance, almost no one does, but the whole idea behind regular expressions is not working, parsing code is still bad most of the time.

You've made me curious: Is there a language out there which does this, i.e. compiles Regex down to native code which is then as fast/faster than hand-coded bytes.hasSuffix(..) calls?
I found this with a bit of searching and clicking around on Stackoverflow: https://www.colm.net/open-source/ragel/ (via http://stackoverflow.com/a/15608037).

I didn't look long enough to know if there's an easy way to convert a regular expression to Ragel syntax.

> Go has regexps, and a very good implementation at it.

In my experience, porting code from Perl to Go, Go's regexp package is vastly inferior to Perl's, in multiple areas, speed, memory, unicode handling (eg: \b works on ascii-only in Go), etc. For example, for some large regexps handling url blacklists, reduced programmatically with Perl's awesome regexp assembly tools, I had to rely on PCRE in the end, Go just could not cope with that (not even the c++ re2). I do avoid regexps, regexps are usually best avoided, and all that, but there are areas in which they are by far the best option. In those areas, I postulate, from my own experience, that Perl's implementation is king. Speed, memory usage, Unicode.

> (not even the c++ re2)

Did you try using RE2's "set" functionality?

No, I did not get that far, would've meant a larger rewrite of the ecosystem, the data files were created by other tools, already in "alternate form" [1] needing to be used by other programs as well. I stopped trying to load them with re2 (both Go and C++), after glancing over all those gigabytes of RSS, while Perl kept them in the 2-300 MB range. PCRE was a good compromise at the time, but with other tradeoffs, because C libs seem to be frowned upon in the Go community, ie. semi-official voices arguing how best to avoid them. :/ (eg: blocking inside C isn't under the gomaxprocs limit, costly overhead crossing the C boundaries, static binary troubles, less portability and so on)

#1. perl -MRegexp::Assemble -E'my @list = qw< foo fo0z bar baz >; my $rx = Regexp::Assemble->new->add( @list )->re; say $rx'

(?^:(?:fo(?:0z|o)|ba[rz]))

cpantesters looks very useful. [1]

I wonder if there's anything like that for Python and Ruby.

[1]: for example, http://cpantesters.org/author/D/DAMOG.html

Less code is generally better. But I've noticed a lot of folks still using ^ or $ when what they really mean is \A or \z
What's the difference? ^ and $ is basically all I remember from when I read Mastering Regular Expressions
\A and \z always match beginning/end of the string.

^ and $ can be changed to mean beginning/end of each line in the string with the /m flag.

>8) Even if regexs are not needed, inevitably the perl dev will use them since that's the perl hammer, and it's hard to know what the intention is with regexes or what the source data even looks like

I'm going to disagree with this one. There's lots of things in any language where it can be hard to see, at a glance, what the intention of the programmer was. That's why we have commenting. You're supposed to comment your blocks of code so that someone else can look at it and understand what that block of code is supposed to do.

Unfortunately, as far as I can tell by looking at other people's code, I appear to be one of the only programmers on the planet who actually uses comments....

Ideally the code itself should communicate that intent. And comments can become obsolete as code changes. Hence the movement to reduce comments to only what's necessary.
1. What? (anyway use cpanminus these days). 2. Again what? 3. Nope, there are a variety of tools available. Try `cpanm Perl::PrereqScanner::App` followed by `scan-perl-prereqs .` 4. Yeah you can skip test runs `cpanm --notest` , you really want to? The subsequent complaint, you're clearly having an experience I don't have. 5. Again see cpanm 6. Can't comment on this one. 7. Umm, that's a code smell. From cpan that outcome is rare. 8. You use regexes when you need certain kind of things done fast. Don't forget the `/x` flag to ensure it's documented if a non-trivial regex. 9. Actually I spend most of my time in the perl debugger. Older perl codebases do suffer from the magic payload pattern quite a lot. Modern perl, less so. 10. Yeah I agree, one should probably have to explicitly turn off warnings and strict, but whatever.

Anyway I agree, perl has made huge progress since the 1990s. I also agree there's a problem with discoverability in some parts of the cpan ecosystem. Be sure to read the Modern Perl book next time you need to do some perl work. You ought to be pleasantly surprised. Personally with the Moo(se)? family of modules, I enjoy having a multiparadigm language with reasonable optional runtime typing to keep me sane. My biggest complaint is the reference counted garbage collection.

> 1) The perl cpan module doesn't resolve dependencies

What? CPAN absolutely does.

> 2) The cpan module has parsing errors when passing in a list of CPAN packages

Both from the commandline, and in CPAN itself can i install a list of modules as such:

    cpan Data::Dumper Devel::Confess
    
    install Data::Dumper Devel::Confess
> 3) You have to manually grep your perl code to see what modules it depends on

Or you can use a CPAN module for that.

> 4) Module installs take a long time since they can compile and unit test the code

Or you just install them like this, if you're confident in your system:

    install Data::Dumper Devel::Confess
> 5) Non-interactive installs of CPAN modules requires digging in the docs

Non-interactive installs should be using your operating system's package manager, unless you have a special use-case, in which some doc digging is fine.

> 6) CPAN modules aren't used that heavily and can have bugs that would be caught in wider used modules.

You mean "Some CPAN modules".

> 7) Perl devs don't think twice about shelling out to an external binary (that may or may not be installed)

Again, some.

> 8) Even if regexs are not needed, inevitably the perl dev will use them since that's the perl hammer

Eh, fair enough.

> 9) You have to manually include the DataDumper package to debug data structs

    Data::Dumper was first released with perl 5.005
> 10) You have to manually enable warnings and strict check, it's not on by default.

Same in JS, and similar with other languages.

> Anyhow, I think we've made a lot of progress since the 1990s. :)

Not really sure, the trolling culture seems to still be the same as back then.

Regarding the module dependency woes, check out Carton (https://metacpan.org/pod/Carton).