Hacker News new | ask | show | jobs
by paulryanrogers 3022 days ago
While useful to some I think advanced RE are like mixing in Perl or playing code golf with production code. They tend to make code harder to read.

My preference in such cases is for multiple separated or longer REs (which can be at least split in the surrounding code) and each part named or heavily commented. Of course it's always worthwhile to consider non-RE solutions if the problem can be broken down enough.

EDIT: Fixed typo

3 comments

I agree, with regards to production code. However, I find that I use regular expressions constantly many many times a day to grep through my code locally looking for particular things. It comes in very handy to know the advanced tools when you are looking for something unusual. Grep may be my most important programming tool next to vim.
Fair enough, but I really think the benefits of advanced regular expressions are underappreciated in non-production and even non-application contexts. Laypeople (and occasionally even developers) are impressed when you show them how to search through a document or file system using a really complicated pattern, where it would have taken several iterations of data manipulation to achieve the same result without using advanced regular expressions.
It'd help a lot if the grammar was actually readable. Combinations like .* don't visually "read" like a single unit, and then to make everything worse you often need a crazy amount of backslashes.

I'm not sure how you could fix that without introducing completely new characters or color-coding parts of the expression though.

The back slashes for escaping are absolutely awful. This is one of the worst things about Java.

It's much better in languages with regex literals like Ruby and JavaScript.

It's especially nicer in Ruby (which got it from Perl) where you can use whatever delimiters you like for regexes, with /abc/, %r"abc", %r{abc}, %r#abc# and so on all being equivalent, so you can just about always pick something that won't clash with the characters in your pattern (You can even use spaces as the delimiters, which looks terrible).
I agree. Usually, I end up leaning on PEGs instead:

https://nim-lang.org/docs/pegs.html

That's pretty bad:

    import pegs
    echo "xzxy" =~ peg"""
    B <- A 'x' 'y' / C
    A <- '' / 'x' 'z'
    C <- C 'w' / 'v'
    """
Stack overflow

Nim needs to let go of its toy parsing algorithm.