| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by yellowapple 3425 days ago

So I actually found an answer to your question re: backtracking: https://docs.perl6.org/language/faq#What%27s_the_difference_...?

Basically: Perl 6 tokens and rules imply :ratchet, which means no backtracking. You can use raw regexes if for some reason you do need backtracking, but otherwise it looks like a Perl 6 grammar is backtracking-free (and grammars in the real world hopefully use tokens/rules exclusively).

So it might actually be at least tolerable for the problems you need to solve (though it's hard for me to say without knowing the problems in the first place (: ). The main remaining question is the exact algorithm in use; I'd be very surprised if Perl 6 grammars didn't compile down to at least some variation on packrat parsers, which would mean linear complexity, but this is probably - again - implementation-dependent in the "we don't care what you do so long as it passes the Perl 6 test suite" sense. I've yet to find a definitive answer by spelunking through Rakudo's code, but it's reassuring that even Rakudo's grammar for Perl 6 itself seems to be devoid of backtracking (meaning that it's clearly possible to do without it; there are quite a few generic subs/methods in there, though, which could prove me wrong here).

1 comments

Ultimatt 3424 days ago

Unfortunately the regex/grammar engine is one of those components lacking deep optimisation. At the moment. Thats probably a much bigger factor than anything algorithmic.

With tokens another nice thing to note is there is longest token matching and the concept of "proto" tokens and regexes. This lets you have simple decision making between similarly defined tokens without backtracking. For example the grammar I have for biological sequences can simultaneously identify and parse DNA/RNA/Protein without back tracking. Even if a file has a mixture of data I can instantiate the correct subclasses on the fly whilst parsing! https://github.com/MattOates/BioInfo/blob/master/lib/BioInfo...

link

yellowapple 3423 days ago

I reckon the optimizations will come soon (at least that's what I gathered from reading the Perl 6 documentation's "Performance" section); now that Perl 6.c is out in the wild, there's less of a moving target, so the implementations can (and apparently are) starting to focus more on squeezing out more performance.

And yeah, proto regexes are pretty sweet, and they seem to be a natural fit for what you're doing. I'm always surprised by how popular Perl seems to be in biology / life sciences, and projects like BioInfo (and BioPerl, of course) are a great reminder as to why that happens to be.

link