| I sometimes wonder what a syntactically clarified regex could look like. There are two things that often confuse newcomers: - What are escapes are and what needs to be escaped? - The <character-class><repetitions> structure of a regex. - Syntax around things like capture (is the parens part of some matcher? what to escape?) We should have a version of regex that separates characters, character classes and operators, or whatever the regex jargon for those things are. Half the things I usually want to regex for, like parens on a function or dot accessors need to be escaped! A quick example for illustration purposes (please don't point out why this grammar wont map to regex): <startofline>(['a' or 'b']<2,4,greedy>, captureAs="prefix")[number or '.']<2><endofline>
is definitely more approachable and easier to explain than the regex equivalent (which I'm avoiding to write because I don't have time to test if I got capture syntax right).Maybe someone makes a wasm regex-simple transformer we can use in multiple languages. Regex is too useful to have such a scary syntax for newcomers! |
However I'd argue that it's not actually very hard to learn and its brevity makes it easier to retain. (personally I did so using https://www.regular-expressions.info/tutorial.html)
I agree that escaping is a problem, mainly because languages have often different rules for this.