|
|
|
|
|
by brudgers
4697 days ago
|
|
>"I would not use this as a replacement to test the membership of a string in a collection of strings." But that's what a regular expression is used for - testing an arbitrary string for membership within the set of valid strings of the language formally described by the regular expression. The power of a regular expression is that it can enumerate all the valid strings for me. If I have to explicitly list them, what have I gained? To put it another way, the equivalent output to the example is: (?:foo|bar|baz|quux)
it is one character longer than what was produced (?:ba[rz]|foo|quux)
but can reasonably argued to be clearer.What I was getting at with my pseudo-code example is that if the goal is to interpret the input down to the fewest possible states from examples, then the regular expression is redundant - all we need are the examples and `if`. There's shorter syntax: > (frak/pattern [Clojure|Clojars|ClojureScript])
#"(?:Clojure|Clojars|ClojureScript)"
Why not use the simplest possible syntactic sugar?As I said in my first comment, I understand the reasons for creating Frak. I find thinking about it stimulating and illuminating. It provides a great jumping off point around the of the issue of unpacking regular expressions and the terseness of their language. |
|
Formally yes. And if it were always more performant to use a regular expression for this task, I would encourage the use of this tool to do so. However, depending on the data structure containing the strings, it may be more performant to simply search that.
> "it is one character longer than what was produced"
It's not about the length of the pattern that matters. Rather, it's the performance characteristics of the underlying state machine once the expression is compiled.
> "Why not use the simplest possible syntactic sugar?"
While this is certainly easier to read and understand, it will have performance drawbacks when using an NFA engine where backtracking is a real thing. It will also have a larger number of states when compared with the alternative. Suppose I am interested in testing if "ClojureScript" is a member of the set of strings described by the first expression. To be in a final state I will have to enter no fewer than 25 states and backtrack twice. With the second expression I will only need to enter 13 states before being in a final state and will not backtrack at all.For small patterns the choice to use something like frak is arguably splitting hairs; you won't gain much other than you didn't have to write an expression. But for enormous patterns, like the one I share in the README, there are real benefits from the sort of optimization frak provides.