| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by UK-AL 4187 days ago

Regular expressions are high performance if you use automata style(Regular Language) regular expressions, which limits the use of some of the features you can use.

Modern regular expression engines in a lot of languages, actually go beyond the expressiveness of a regular language. This is what damages performance.

There is no reason why this would reduce performance... if its not doing anything crazy.

If anything your taking work away from it. Your building the tree directly here, where as parser would normally build a tree from the string. But since this is integrating into the languages RE library i'm guessing its writing that tree as a string, which is then passed into the regular expression engine, to be turned into a tree again :)

2 comments

UnoriginalGuy 4187 days ago

I guess it depends on your definition of "high performance."

If a regular expression runs too often, even pre-compiled (as they should be), you'll want to replace them with code written in the native language. I've gone in and replaced a one line search/replace written in RegX (compiled), with just a C-style for() loop over the wchar array, and had the memory usage drop by near 80% and performance increase by over 60%.

So high performance is all relative. However RegX isn't something I'd describe that way, even compiled. It is a nice way to write complex string parsing code quickly however.

link

jerf 4187 days ago

If your regex is replaceable by a simple "find_substring" or equivalent, it's slow.

If your regex is complicated, it will probably beat any naive attempt to write it into conventional string processing, short of reimplementing regexs in the first place. Especially since in many languages, "conventional string processing" may involve the creation of lots of copies and sub-copies.

link

d4n3 4186 days ago

> you'll want to replace them with code written in the native language

Probably not true for Javascript (and other scripted languages) - matching regex uses native and highly optimized regex lib, which will usually be orders of magnitude faster than implementing this in the language.

link

UnoriginalGuy 4186 days ago

That isn't relevant in this context as the library linked couldn't be integrated into JavaScript.

link

d4n3 4186 days ago

Sorry, which library do you mean? The OP is a javascript library..

I just wanted to point out that regex is much faster in javascript than doing things 'by hand'.

link

UK-AL 4187 days ago

A regular expression implemented as a DFA would literally be looping over the string, and a state transition table. I don't see how performance could be bad.

It is highly dependent on the regular expression engine you use, most don't use automata because of extra features.

link

raiph 4186 days ago

Perl 6 unifies "regexes" and recursive descent grammars at the syntax level and then compiles them to a hybrid of DFAs, NFAs, and regular code as necessary. The idea is to maintain the simplicity of simple regexes and of parser combinators for the user but retain as much of the performance benefits of true DFA-able regular expressions as is possible.

At least, that's the theory. In practice, while the benefit of syntactic usability is available today, the Perl 6 rules engine is still very slow and it'll likely take years to optimize the heck out of this approach and really harvest the performance benefit.

link

Blackthorn 4187 days ago

I wasn't sure to be impressed or horrified when I learned that Perl supported recursive regular expressions.

link