| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by parhamn 2244 days ago

I sometimes wonder what a syntactically clarified regex could look like. There are two things that often confuse newcomers:

- What are escapes are and what needs to be escaped?

- The <character-class><repetitions> structure of a regex.

- Syntax around things like capture (is the parens part of some matcher? what to escape?)

We should have a version of regex that separates characters, character classes and operators, or whatever the regex jargon for those things are. Half the things I usually want to regex for, like parens on a function or dot accessors need to be escaped!

A quick example for illustration purposes (please don't point out why this grammar wont map to regex):

    <startofline>(['a' or 'b']<2,4,greedy>, captureAs="prefix")[number or '.']<2><endofline>

is definitely more approachable and easier to explain than the regex equivalent (which I'm avoiding to write because I don't have time to test if I got capture syntax right).

Maybe someone makes a wasm regex-simple transformer we can use in multiple languages. Regex is too useful to have such a scary syntax for newcomers!

1 comments

yoz-y 2244 days ago

I think most people just like to hate on regex syntax because when just glanced over it looks like spilled tea leaves.

However I'd argue that it's not actually very hard to learn and its brevity makes it easier to retain. (personally I did so using https://www.regular-expressions.info/tutorial.html)

I agree that escaping is a problem, mainly because languages have often different rules for this.

link