| HN Mirror

my $string = 'AAABBBCCC'; say $string ~~ / ^ # match at least one A # store the result in a named sub-entry $<A> = [ A+ ] {} # update result object # create a lexical var named $repetition :my $repetition = $<A>.chars(); # <- embedded Raku syntax # match B and then C exactly $repetition times $<B> = [ B ** {$repetition} ] $<C> = [ C ** {$repetition} ] $ /;

I don't think we disagree here. To clarify, my statement about using "actual parsers" over regexes was more directed at my own library than Raku. Since I had just posted a link on how to "parse" media types using my library, I wanted to immediately follow that with a word of caution of "But don't do that! You shouldn't be using (traditional) regexes to parse! They are the wrong tool for that. How unfortunate it is that most languages have a super simple syntax for (traditional/PCRE) regexes and not for parsing." I had seen in the article that Raku had some sort of "grammar" concept, so I was kind of saying "oh it looks like Raku may be tackling that to."

Hopefully that clarifies that I was not necessarily making any statement about whether or not to use Raku regexes, which I don't pretend to know well enough to qualify to give advice around. Just for the sake of interesting discussion however, I do have a few follow up comments to what you wrote:

1. Aside from my original confusing use of the term "regexes" to actually mean "PCRE-style regexes", I recognize I also left a fair amount of ambiguity by referring to "actual parsers". Given that there is no "true" requirement to be a parser, what I was attempting to say is something along the lines of: a tool designed to transform text into some sort of structured data, as opposed to a tool designed to match patterns. Again, from this alone, seems like Raku regexes qualify just fine.

2. That being said, I do have a separate issue with using regexes for anything, which is that I do not think it is trivial to reason about the performance characteristics of regexes. IOW, the syntax "doesn't scale". This has already been discussed plenty of course, but suffice it to say that backtracking has proven undeniably popular, and so it seems an essential part of what most people consider regexes. Unfortunately this can lead to surprises when long strings are passed in later. Relatedly, I think regexes are just difficult to understand in general (for most people). No one seems to actually know them all that well. They venture very close to "write-only languages". Then people are scared to ever make a change in them. All of this arguably is a result of the original point that regexes are optimized for quick and dirty string matching, not to power gcc's C parser. This is all of course exacerbated by the truly terrible ergonomics, including not being able to compose regexes out of the box, etc. Again, I think you make a case here that Raku is attempting to "elevate" the regex to solve some if not all of these problems (clearly not only composable but also "modular", as well as being able to control backtracking, etc.) All great things!

I'd still be apprehensive about the regex "atoms" since I do think that regexes are not super intuitive for most people. But perhaps I've reversed cause and effect and the reason they're not intuitive is because of the state they currently exist in in most languages, and if you could write them with Raku's advanced features, regexes would be no more unintuitive than any other language feature, since you aren't forced to create one long unterminated 500-character regex for anything interesting. In other words, perhaps the "confusing" aspects of regexes are much more incidental to their "API" vs. an essential consequence of the way they describe and match text.

3. I'd like to just separately point out that many aspects of what you mentioned was added to regexes could be added to other kinds of parsers as well. IOW, "actual parsers" could theoretically parse Raku, if said "actual parsers" supported the discussed extensions. For example, there's no reason PEG parsers couldn't allow you to fall into dynamic sub-languages. Perhaps you did not mean to imply that this couldn't be the case, but I just wanted to make sure to point out that these extensions you mention appear to have much more generally applicable than they are perhaps given credit for by being "a part of regexes in Raku" (or maybe that's not the case at all and it was just presented this way in this comment for brevity, totally possible since I don't know Raku).

I'll certainly take a closer look at the full Raku grammar stuff since I've written lots of parser extensions that I'd be curious have analogues in Raku or might make sense to add to it, or alternatively interesting other ideas that can be taken from Raku. I will say that RakuAST is something I've always wanted languages to have, so that alone is very exciting!