Hacker News new | ask | show | jobs
by UK-AL 4139 days ago
Regular expressions are high performance if you use automata style(Regular Language) regular expressions, which limits the use of some of the features you can use.

Modern regular expression engines in a lot of languages, actually go beyond the expressiveness of a regular language. This is what damages performance.

There is no reason why this would reduce performance... if its not doing anything crazy.

If anything your taking work away from it. Your building the tree directly here, where as parser would normally build a tree from the string. But since this is integrating into the languages RE library i'm guessing its writing that tree as a string, which is then passed into the regular expression engine, to be turned into a tree again :)

2 comments

I guess it depends on your definition of "high performance."

If a regular expression runs too often, even pre-compiled (as they should be), you'll want to replace them with code written in the native language. I've gone in and replaced a one line search/replace written in RegX (compiled), with just a C-style for() loop over the wchar array, and had the memory usage drop by near 80% and performance increase by over 60%.

So high performance is all relative. However RegX isn't something I'd describe that way, even compiled. It is a nice way to write complex string parsing code quickly however.

If your regex is replaceable by a simple "find_substring" or equivalent, it's slow.

If your regex is complicated, it will probably beat any naive attempt to write it into conventional string processing, short of reimplementing regexs in the first place. Especially since in many languages, "conventional string processing" may involve the creation of lots of copies and sub-copies.

> you'll want to replace them with code written in the native language

Probably not true for Javascript (and other scripted languages) - matching regex uses native and highly optimized regex lib, which will usually be orders of magnitude faster than implementing this in the language.

That isn't relevant in this context as the library linked couldn't be integrated into JavaScript.
Sorry, which library do you mean? The OP is a javascript library..

I just wanted to point out that regex is much faster in javascript than doing things 'by hand'.

A regular expression implemented as a DFA would literally be looping over the string, and a state transition table. I don't see how performance could be bad.

It is highly dependent on the regular expression engine you use, most don't use automata because of extra features.

Perl 6 unifies "regexes" and recursive descent grammars at the syntax level and then compiles them to a hybrid of DFAs, NFAs, and regular code as necessary. The idea is to maintain the simplicity of simple regexes and of parser combinators for the user but retain as much of the performance benefits of true DFA-able regular expressions as is possible.

At least, that's the theory. In practice, while the benefit of syntactic usability is available today, the Perl 6 rules engine is still very slow and it'll likely take years to optimize the heck out of this approach and really harvest the performance benefit.

I wasn't sure to be impressed or horrified when I learned that Perl supported recursive regular expressions.