| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by UnoriginalGuy 4187 days ago

Looks like Linq (from .Net/C#). Pretty sexy way to write Regular Expressions if you ask me.

I've "learned" regular expressions multiple times but it just never sticks, I have no idea why. It certainly doesn't help that there are several different incompatible syntaxes (so what I remember and think "should" work doesn't).

I'd prefer to write RegX's in this style, however I would pay attention to performance (not that Regular Expressions are high performance, however I wouldn't want to see a large performance loss either).

4 comments

UK-AL 4187 days ago

Regular expressions are high performance if you use automata style(Regular Language) regular expressions, which limits the use of some of the features you can use.

Modern regular expression engines in a lot of languages, actually go beyond the expressiveness of a regular language. This is what damages performance.

There is no reason why this would reduce performance... if its not doing anything crazy.

If anything your taking work away from it. Your building the tree directly here, where as parser would normally build a tree from the string. But since this is integrating into the languages RE library i'm guessing its writing that tree as a string, which is then passed into the regular expression engine, to be turned into a tree again :)

link

UnoriginalGuy 4186 days ago

I guess it depends on your definition of "high performance."

If a regular expression runs too often, even pre-compiled (as they should be), you'll want to replace them with code written in the native language. I've gone in and replaced a one line search/replace written in RegX (compiled), with just a C-style for() loop over the wchar array, and had the memory usage drop by near 80% and performance increase by over 60%.

So high performance is all relative. However RegX isn't something I'd describe that way, even compiled. It is a nice way to write complex string parsing code quickly however.

link

jerf 4186 days ago

If your regex is replaceable by a simple "find_substring" or equivalent, it's slow.

If your regex is complicated, it will probably beat any naive attempt to write it into conventional string processing, short of reimplementing regexs in the first place. Especially since in many languages, "conventional string processing" may involve the creation of lots of copies and sub-copies.

link

d4n3 4186 days ago

> you'll want to replace them with code written in the native language

Probably not true for Javascript (and other scripted languages) - matching regex uses native and highly optimized regex lib, which will usually be orders of magnitude faster than implementing this in the language.

link

UnoriginalGuy 4186 days ago

That isn't relevant in this context as the library linked couldn't be integrated into JavaScript.

link

d4n3 4186 days ago

Sorry, which library do you mean? The OP is a javascript library..

I just wanted to point out that regex is much faster in javascript than doing things 'by hand'.

link

UK-AL 4186 days ago

A regular expression implemented as a DFA would literally be looping over the string, and a state transition table. I don't see how performance could be bad.

It is highly dependent on the regular expression engine you use, most don't use automata because of extra features.

link

raiph 4186 days ago

Perl 6 unifies "regexes" and recursive descent grammars at the syntax level and then compiles them to a hybrid of DFAs, NFAs, and regular code as necessary. The idea is to maintain the simplicity of simple regexes and of parser combinators for the user but retain as much of the performance benefits of true DFA-able regular expressions as is possible.

At least, that's the theory. In practice, while the benefit of syntactic usability is available today, the Perl 6 rules engine is still very slow and it'll likely take years to optimize the heck out of this approach and really harvest the performance benefit.

link

Blackthorn 4186 days ago

I wasn't sure to be impressed or horrified when I learned that Perl supported recursive regular expressions.

link

amageed 4186 days ago

If you're interested in something similar for .NET / C#, check out my Regextra library, specifically the Passphrase Regex Builder: https://github.com/amageed/Regextra

As the name suggests though, the focus was on passphrase criteria and it wasn't to produce a DSL for general regex building. The library also supports named templates and a few utility methods.

link

Retra 4186 days ago

This is why I dislike the design of Linq. The pattern of chaining function calls to implement a DSL is common enough that they should have employed a general solution, not just a wonky SQL-specific version.

link

amageed 4185 days ago

LINQ isn't SQL-specific and does apply generally. It can be used against the standard .NET framework objects and collections. There are different LINQ focuses or flavors, and there are 2 ways to write queries. There's LINQ to Objects, LINQ to XML, LINQ to SQL (no longer actively maintained; nowadays Entity Framework is the Microsoft alternative), and you can write your own LINQ providers to target other purposes.

As for syntax, there's the fluent syntax (chained methods), and there's the query syntax which is syntactic sugar that gets compiled to the methods. The query syntax is probably the biggest reason people mistake LINQ for being SQL specific since it resembles SQL.

E.g.,

  var results = SomeCollection.Where(c => c.SomeProperty < 10)
                              .Select(c => new { c.SomeProperty, c.OtherProperty });

The same thing in query syntax:

  var results = from c in SomeCollection
                where c.SomeProperty < 10
                select new { c.SomeProperty, c.OtherProperty };

Then you can iterate over both the same way:

  foreach (var result in results)
  {
      Console.WriteLine(result);
  }

link

rripken 4186 days ago

Performance is unaffected. This provides a fluent and verbose way of building a regular expression. Users of the library then feed the built regular expression into their standard regular expression engine.

link