| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Drup 2692 days ago

This is cute! It's basically a nice syntax for full parsing of extended regular expressions. It would make for a very nice tool. Far too often, regular expression tools only implement "matching" (i.e., extract a list of strings). Full parsing gives you the complete parsetree, and thus preserve more structural information, especially under repetitions.

I'm very much in support of anything that uses regex parsing instead of matching. Matching is nearly always the wrong tool and cause more bugs than anything. The only reason it's used is that the engines are easier to implement. I wrote a small paper on how to retrofit parsing on top of an engine that only gives you matching recently (https://gabriel.radanne.net/papers/tyre/tyre_paper.pdf).

Given the amount of prior art in the academic community, patenting this is probably worthless, but eh ...

2 comments

perlgeek 2692 days ago

The Perl 6 grammar engine does full parsing, and gives you the parse tree that matches the structure of the rules.

link

Drup 2692 days ago

Well, except Perl doesn't parse regular grammars (it parses much more) and is far from being "one pass" (since the complexity guarantees are not valid anymore) ....

link

tux1968 2692 days ago

Regular grammars a strict subset of what Perl 6 will parse, no? If you stick to that subset it will parse in a single pass.

link

Drup 2692 days ago

Except Perl 6 doesn't enforce it, so you have to guess yourself if you are really in the regular case. Additionally, PCRE used to be exponential for certain "bad cases", some of which were (truly) regular expressions.

My point is simply that Perl doesn't give you any guarantee except "we will try to parse it". Maybe you will hit the right case for the right version of Perl, who knows ? The Web is full of DDOS attack based on exponential PCREs.

With actual regular expression, the complexity is guaranteed.

link

perlgeek 2692 days ago

> My point is simply that Perl doesn't give you any guarantee except "we will try to parse it".

Perl 6 gives you pretty good control over where to use backtracking and where not. It parses regular languages with a NFA or DFA, and can switch to a backtracking engine.

More details in https://www.apress.com/us/book/9781484232279 (sorry for the plug, wrote it myself).

link

pawelmurias 2692 days ago

The way regular expressions work in Perl doesn't randomly degrade across Perl versions. For a given regex you can determine what it's complexity would be across versions just like you would for an arbitrary chunk of Perl code. It's true you don't always get a complexity guarantee for free like in a DFA implementation of formal regular expressions.

link

b2gills 2689 days ago

Actually Perl6 does enforce it if you tell it to. The following fails to match.

    'abc' ~~ m:ratchet/ .* c /

The `.*` gobbles everything up, which means there is nothing left for `c`

When writing grammars, use `token` which turns on `:ratchet` for you.

link

cestith 2692 days ago

PCRE isn't Perl.

link

centerOfMass 2692 days ago

This might be the real value from this thread.

Any more resources on this? Is this the same as "Parsing expression grammar"?

link

Drup 2692 days ago

No, Parsing expression grammar (PEG) are not regular grammars at all.

For more resources, I think the related work of my paper has the main ones. The "Kleene meets Church" project has lot's of very good publications on the topic: https://di.ku.dk/kmc/publications/

link