Hacker News new | ask | show | jobs
by Cu3PO42 2694 days ago
The claims seem dubious to me as well. What exactly does faster mean here? Since a speedup of two orders of magnitude is mentioned I'm assuming it refers to matching speed.

Is it faster than PCRE? Is it faster than some other engine or is the claim that it's faster than any available RegEx engine? I had a quick look at the reference and this looks like it will accept context-free languages (since recursion is allowed). I strongly doubt that a CFG parser is magically faster than any RegEx engine.

> It is hundreds of times faster than the well-known regular expressions, and the speed doesn’t degrade linearly when you add patterns.

The author seems to believe that matching time for RegExes is linear in the time of the pattern? Once compiled to a DFA, the pattern only has negligible influence on the matching time.

EDIT: I've been trying to figure out what kind of parser this generates. It might be a LR(k) parser? It breaks on the following (admittedly contrived) example:

  #S = E + End;
  E = {E + "+" + E, E + "*" + E, T};
  T = {[1+]Alpha, N + "()"};
  N = {P, Q};
  P = {"a" + P, "ab"};
  Q = [1+]"a";
with an input of "aaaaaaab()+beta*c"
1 comments

> Once compiled to a DFA, the pattern only has negligible influence on the matching time.

With the right optimisations, certain longer patterns can be much faster to scan for, on account of the Boyer-Moore approach.

If I asked you to search through a book for 10 consecutive pages of the letter 'Q', you wouldn't need to check every page. The same optimisation can be applied to regex. (Not that most regex implementations bother to do it, though.)

You're right. What I meant to say is that for any (formal) regular expression, we can compile it to a DFA and then match any strings in time linear in the length of the string.

Certainly the pattern length matters both for pre-processing (compiling to DFA) and the runtime in a Boyer-Moore approach. However, as you mentioned in the Boyer-Moore average case of Θ(n/m) a longer pattern is faster, rather than slower as the page on Nevod seems to imply.