Hacker News new | ask | show | jobs
by dougmwne 1678 days ago
I think regex is absolutely terrible garbage. It’s very powerful, I do appreciate not having to write complex conditional statements, and it’s great that it’s available is so many languages and applications. But it’s just a bad tool, terribly unreadable, easy to introduce bugs, with lots of trial and error. It has too much brevity and would be much better if it were longer but more human readable. I’m sure there are other string search paradigms that are far better but relatively unknown.
10 comments

You can get a feel for what that would look like here: https://metacpan.org/release/CHROMATIC/Regexp-English-1.01/v...

But then you're just memorizing things like 'start_of_line' instead of '^'. Perhaps easier to read, but no easier to write.

        -> start_of_line
        -> literal('Flippers')
        -> literal(':')
        -> optional
                -> whitespace_char
        -> end
        -> remember
                -> multiple
                        -> digit;
I literally can’t parse this as a whole. /^Flippers:\s?(\d+)/ is so much more obvious compared to that utter nonsense.
Like most code, it's easier to write regex than to read it later. In my recent vim history:

    /(\([^()]\|\n\)*\n\([^()]\|\n\)*)
This was from two days ago. I think I was searching a huge sheet of regex match groups for any having line breaks to join. In a month, I'm not even sure I would recognize that I had authored this.
So what. That was a problem you had to solve, imagine how helpless you’d feel if you had it with no regex available. Matching non-parenthesis or newline for two lines (prefix and suffix unrestricted) it is. Idk if it took half an hour or more to implement that in python, js or (god forbid) a low level language. You probably made it in less than a minute. And nobody would take their time to read a page of .substr(i, -(j-i)-1) two days later either.
not every solution has to be reusable
Your long-hand isn't quite the same as your regex...it should be remember -> one_or_more -> digit;

In regex parlance, \d+ explicitly allows for one or more digits. Multiple tacitly implies 2 or more which would be \d{2,}

Also, your end char (which I assume you mean $) would be after the remember -> one_or_more -> digit;

I didn’t refer to the manual (which is the entire goal of that format, isn’t it?) and don’t know what ‘multiple’ really means. So I stand both corrected and confirmed, I guess.

That ‘end’ thing just closes the ‘optional’ group, I believe. There is no $ in an English form of this regex either.

Readability is very important though. If you can spend a couple of more seconds of programmming time to prevent several minutes(or longer!) of understanding time, I'd call that a good use of resources. I don't think that link is quite there yet but it's a good start.
It's more readable individually, but for many regexes the verbose nature could make it harder to read overall.
There's a good article about K that can give you a feel on how long names may not always be more readable: http://nsl.com/papers/denial.html
The readability isn't so bad if you let yourself allocate as much time and mental effort to understanding the one-line regex as you would use to understand the 100 line string-processing function that it replaces. And the brevity makes regexes handy on the command line and in single-line input fields in text editor search functions.

I do prefer using parser combinators for more complex tasks.

I’m sure there are other string search paradigms that are far better but relatively unknown

Sure if they were, we’d already discover them. All of the regex criticism boils down to few simple statements for categories of cases:

1) I didn’t learn regex and have no cheatsheet

Learn it or at least print a cheatsheet and stick it to the wall.

2) The problem that this specific regex solves is a hell of a regular problem under any representation.

Any particular regex is only as terrible as a ladder of corresponding if’s and for’s would be. Deal with it.

3) The problem that this specific regex solves is not a regular language.

Use a proper xml parser.

You seem to forget that regular expressions are pretty much simply required - and at least for their more simpler cases, their syntax is reasonable - 'syn[a-z ]+?able' is far from unreadable and unwritable.

You have some text to process, open your text editor, you will probably use a dozen regular expressions for that - this is very frequent for many. Can you conceive a better syntax, at least for the simple cases?

Ignoring the flame bait, for me the only things I wish more regex engines supported (cough JavaScript) is the ability to ignore whitespace, and have named groups. Python has a flag to do this, and being able to have multi-line regexes with comments and named groups is phenomenal and greatly improves readability of more complex regexes.

In general I would say ~70% of regexes are highly readable. With tools like the above, you can probably go to like ~85%? There are some regexes that are super complicated and then likely should be refactored into a composition of simpler regexes. But that's just a guess. I wonder if there are any studies done about this...

> Ignoring the flame bait, for me the only things I wish more regex engines supported (cough JavaScript) is the ability to ignore whitespace, and have named groups.

Irregex? http://synthcode.com/scheme/irregex/

Interesting! I don't think that's what I mean. I don't think I want it to be a part of the language like that, but that's a pretty neat idea. For example, python has an 'X' flag you can use when creating a regex to allow new lines and comments. Here's an example from my code: https://github.com/internetarchive/openlibrary/blob/1ac15a48...
I’d argue that regex is elegant and an incredibly useful to have in development… but it’s definitely definitely easy to have ‘too much of a good thing’ here.
Regex is a great tool. Just use good taste and don't overdo it with regexes.

They're very effective at what they do as long as you don't make insane brainteasers that make people curse your name.

This is why I like tools such as RegExBuddy which breaks down the regex into a graphics. It does real-time match highlights of test text and emulates most Regex engines.
It’s just a small programming language.