| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Ultimatt 3417 days ago
	Unfortunately the regex/grammar engine is one of those components lacking deep optimisation. At the moment. Thats probably a much bigger factor than anything algorithmic. With tokens another nice thing to note is there is longest token matching and the concept of "proto" tokens and regexes. This lets you have simple decision making between similarly defined tokens without backtracking. For example the grammar I have for biological sequences can simultaneously identify and parse DNA/RNA/Protein without back tracking. Even if a file has a mixture of data I can instantiate the correct subclasses on the fly whilst parsing! https://github.com/MattOates/BioInfo/blob/master/lib/BioInfo...

1 comments

yellowapple 3416 days ago

I reckon the optimizations will come soon (at least that's what I gathered from reading the Perl 6 documentation's "Performance" section); now that Perl 6.c is out in the wild, there's less of a moving target, so the implementations can (and apparently are) starting to focus more on squeezing out more performance.

And yeah, proto regexes are pretty sweet, and they seem to be a natural fit for what you're doing. I'm always surprised by how popular Perl seems to be in biology / life sciences, and projects like BioInfo (and BioPerl, of course) are a great reminder as to why that happens to be.