Hacker News new | ask | show | jobs
by brakl 3766 days ago
> Go has regexps, and a very good implementation at it.

In my experience, porting code from Perl to Go, Go's regexp package is vastly inferior to Perl's, in multiple areas, speed, memory, unicode handling (eg: \b works on ascii-only in Go), etc. For example, for some large regexps handling url blacklists, reduced programmatically with Perl's awesome regexp assembly tools, I had to rely on PCRE in the end, Go just could not cope with that (not even the c++ re2). I do avoid regexps, regexps are usually best avoided, and all that, but there are areas in which they are by far the best option. In those areas, I postulate, from my own experience, that Perl's implementation is king. Speed, memory usage, Unicode.

1 comments

> (not even the c++ re2)

Did you try using RE2's "set" functionality?

No, I did not get that far, would've meant a larger rewrite of the ecosystem, the data files were created by other tools, already in "alternate form" [1] needing to be used by other programs as well. I stopped trying to load them with re2 (both Go and C++), after glancing over all those gigabytes of RSS, while Perl kept them in the 2-300 MB range. PCRE was a good compromise at the time, but with other tradeoffs, because C libs seem to be frowned upon in the Go community, ie. semi-official voices arguing how best to avoid them. :/ (eg: blocking inside C isn't under the gomaxprocs limit, costly overhead crossing the C boundaries, static binary troubles, less portability and so on)

#1. perl -MRegexp::Assemble -E'my @list = qw< foo fo0z bar baz >; my $rx = Regexp::Assemble->new->add( @list )->re; say $rx'

(?^:(?:fo(?:0z|o)|ba[rz]))