|
|
|
|
|
by tolmasky
589 days ago
|
|
I implemented something similar to the compositional regular expressions feature described here for JavaScript a while ago (independently, so semantics may not be the same), and it is one of the libraries I find myself most often bringing into other projects years later. It gets you a tiny bit closer to feeling like you have a first-class parser in the language. Here is an example of implementing media type parsing with regexes using it: https://runkit.com/tolmasky/media-type-parsing-with-template... "templated-regular-expression" on npm, GitHub: https://github.com/tolmasky/templated-regular-expression To be clear, programming languages should just have actual parsers and you shouldn't use regular expressions for parsers. But if you ARE going to use a regular expression, man is it nice to break it up into smaller pieces. |
|
Raku regular expressions combined with grammars are far more powerful, and if written well, easier to understand than any "actual parser". In order to parse Raku with an "actual parser" it would have to allow you to add and remove things from it as it is parsing. Raku's "parser" does this by subclassing the current grammar adding or removing them in the subclass, and then reverting back to the previous grammar at the end of the current lexical scope.
In Raku, a regular expression is another syntax for writing code. It just has a slightly different default syntax and behavior. It can have both parameters and variables. If the regular expression syntax isn't a good fit for what you are trying to do, you can embed regular Raku syntax to do whatever you need to do and return right back to regular expression syntax.
It also has a much better syntax for doing advanced things, as it was completely redesigned from first principles.
The following is an example of how to match at least one `A` followed by exactly that number of `B`s and exactly that number of `C`s.
(Note that bare square brackets [] are for grouping, not for character classes.)
Result: The result is actually a very extensive object that has many ways to interrogate it. What you see above is just a built-in human readable view of it.In most regular expression syntaxes to match equal amounts of `A`s and `B`s you would need to recurse in-between `A` and `B`. That of course wouldn't allow you to also do that for `C`. That also wouldn't be anywhere as easy to follow as the above. The above should run fairly fast because it never has to backtrack, or recurse.
When you combine them into a grammar, you will get a full parse-tree. (Actually you can do that without a grammar, it is just easier with one.)
To see an actual parser I often recommend people look at JSON::TINY::Grammar https://github.com/moritz/json/blob/master/lib/JSON/Tiny/Gra...
Frankly from my perspective much of the design of "actual parsers" are a byproduct of limited RAM on early computers. The reason there is a separate tokenization stage was to reduce the amount of RAM used for the source code so that further stages had enough RAM to do any of the semantic analysis, and eventual compiling of the code. It doesn't really do that much to simplify any of the further stages in my view.
The JSON::Tiny module from above creates the native Raku data structure using an actions class, as the grammar is parsing. Meaning it is parsing and compiling as it goes.