Hacker News new | ask | show | jobs
by noprompt 4685 days ago
> "But that's what a regular expression is used for - testing an arbitrary string for membership within the set of valid strings of the language formally described by the regular expression."

Formally yes. And if it were always more performant to use a regular expression for this task, I would encourage the use of this tool to do so. However, depending on the data structure containing the strings, it may be more performant to simply search that.

> "it is one character longer than what was produced"

It's not about the length of the pattern that matters. Rather, it's the performance characteristics of the underlying state machine once the expression is compiled.

> "Why not use the simplest possible syntactic sugar?"

  (?:Clojure|Clojars|ClojureScript)
While this is certainly easier to read and understand, it will have performance drawbacks when using an NFA engine where backtracking is a real thing. It will also have a larger number of states when compared with the alternative.

  Cloj(?:ure(?:Script)?|ars)
Suppose I am interested in testing if "ClojureScript" is a member of the set of strings described by the first expression. To be in a final state I will have to enter no fewer than 25 states and backtrack twice. With the second expression I will only need to enter 13 states before being in a final state and will not backtrack at all.

For small patterns the choice to use something like frak is arguably splitting hairs; you won't gain much other than you didn't have to write an expression. But for enormous patterns, like the one I share in the README, there are real benefits from the sort of optimization frak provides.