| "Actual parsers" aren't powerful enough to be used to parse Raku. Raku regular expressions combined with grammars are far more powerful, and if written well, easier to understand than any "actual parser". In order to parse Raku with an "actual parser" it would have to allow you to add and remove things from it as it is parsing. Raku's "parser" does this by subclassing the current grammar adding or removing them in the subclass, and then reverting back to the previous grammar at the end of the current lexical scope. In Raku, a regular expression is another syntax for writing code. It just has a slightly different default syntax and behavior. It can have both parameters and variables. If the regular expression syntax isn't a good fit for what you are trying to do, you can embed regular Raku syntax to do whatever you need to do and return right back to regular expression syntax. It also has a much better syntax for doing advanced things, as it was completely redesigned from first principles. The following is an example of how to match at least one `A` followed by exactly that number of `B`s and exactly that number of `C`s. (Note that bare square brackets [] are for grouping, not for character classes.) my $string = 'AAABBBCCC';
say $string ~~ /
^
# match at least one A
# store the result in a named sub-entry
$<A> = [ A+ ]
{} # update result object
# create a lexical var named $repetition
:my $repetition = $<A>.chars(); # <- embedded Raku syntax
# match B and then C exactly $repetition times
$<B> = [ B ** {$repetition} ]
$<C> = [ C ** {$repetition} ]
$
/;
Result: 「AAABBBCCC」
A => 「AAA」
B => 「BBB」
C => 「CCC」
The result is actually a very extensive object that has many ways to interrogate it. What you see above is just a built-in human readable view of it.In most regular expression syntaxes to match equal amounts of `A`s and `B`s you would need to recurse in-between `A` and `B`. That of course wouldn't allow you to also do that for `C`. That also wouldn't be anywhere as easy to follow as the above. The above should run fairly fast because it never has to backtrack, or recurse. When you combine them into a grammar, you will get a full parse-tree. (Actually you can do that without a grammar, it is just easier with one.) To see an actual parser I often recommend people look at JSON::TINY::Grammar https://github.com/moritz/json/blob/master/lib/JSON/Tiny/Gra... Frankly from my perspective much of the design of "actual parsers" are a byproduct of limited RAM on early computers. The reason there is a separate tokenization stage was to reduce the amount of RAM used for the source code so that further stages had enough RAM to do any of the semantic analysis, and eventual compiling of the code. It doesn't really do that much to simplify any of the further stages in my view. The JSON::Tiny module from above creates the native Raku data structure using an actions class, as the grammar is parsing. Meaning it is parsing and compiling as it goes. |
The main problem with generalised regexes is that you can't match them in linear time worst-case. I'm wondering if this is addressed at all by Raku.