Hacker News new | ask | show | jobs
by brainwipe 1678 days ago
I learn regex just before I need it. Every. Time. After 25 years, it just doesn't stick.
10 comments

I was excellent at regex early in my career... actually had a job where that's basically all I did for 9 months. Read the O'Reilly book Mastering Regular Expressions from cover to cover and referenced it multiple times per day. Doing regex at a high level was instinctual.

Lost much of that knowledge within a couple years after leaving that job... was shocked how much wasn't retained when I stumbled into another project that required a fair bit of regex work. There was some muscle memory involved and I was able to ramp up quickly, but now 20 years after that initial job I'm just like you.

In fairness, I don't think I was ever a heavy user quite like you describe but if I take another language that I knew well and now dip into every so often - like C - I'm going again in 5 minutes. I don't find that with regex. I do try ... I think it's cool but it just slides straight out.
I think the difference though is C is not unlike most procedural programming languages. The main thing I imagine is coming back up to speed w/ stdlib and having to get back into the rhythm of managing memory. Regex is unlike anything you typically do on a day-to-day basis. It's just an alien mental model and once you get past the basics feels like you've acquired some sort of superpower few actually need.

It's like being a native English speaker who also speaks a few other latin based languages vs. having that one eastern language thrown into the mix.

Same. I just don't use it often enough for it to really stick. For many years my regex knowledge was basically stuck at how to use ".*" and things like [a-d] or [a-dA-D] blah, etc. Never used backreferences, capture groups, etc.

The thing that finally forced me to dig a little deeper was an assignment at work that involved Apache HTTPD and mod_proxy and the need to define some really complex routing rules that were imposed on us by something upstream of our service. We wound up having to peek into the incoming URL and route things differently based on sub-elements of the overall path. So I finally had to learn to use capture groups and get into the difference between the "greedy" and "non greedy" matching, yadda, etc. And the thing is, when I figured it out and got all that working, I felt like I'd acquired a new super-power.

For about 3 weeks. Now, I'm pretty sure all of the new stuff I learned has totally escaped my memory again, because - once again - I haven't had any call to touch a regex in almost 2 years.

sigh I should probably look for an Anki deck on regexes and start doing spaced repetition on them just to try to finally get this stuff locked in.

I simply don't want to memorize 3 or more different versions of the syntax codes.

Sometimes I'm using .Net. Sometimes I'm using Python. Sometimes I'm using whatever oddball engine the developer chose.

I know how regex works. I can use forward and backward references. I can combine complex patterns. I can match, extract, replace, transform, etc. Sometimes I have even used nested patterns (though I'd probably need half an hour to re-learn it well enough to read one). But I'm not sitting down and memorizing the difference between /d and /D, or \S and \w or any of that. Frankly, I'm very lucky if I remember the difference between ^ and $. I will have the .Net[0] and the Python[1] doc in my bookmarks forever.

I'm not remotely ashamed of it, either. The codes are completely arbitrary with absolutely no intrinsic meaning. Worse, it's not easy to tell the difference at a glance between literal characters, character classes, operators, wildcards, special constructs, etc. More than once I've been confused by a regex only to discover it does something I didn't know they could even do. Regex patterns are meant to be concise and comprehensible to the regex engine, not to the programmer.

Don't feel bad because you don't memorize an arbitrary and complex syntax. Memorizing syntax is not the job of a programmer. The job of a programmer is to compose the logic and design the system and know that a syntax exists to compose it in. A programmer is an author, not a linguist.

[0]: https://docs.microsoft.com/en-us/dotnet/standard/base-types/...

[1]: https://docs.python.org/3/howto/regex.html

That's one heavy pain point in emacs for instance. Even though it seems trivial to map char classes and syntax switch it's so utterly tiring. To the point where my first reaction is "cli unix tools share most of the syntax, it's so fluid... runs grep"
I agree. Regex for me is particularly slippery - I haven't needed to code 68000 assembler since 2002 but I had a play with an emulator last month and slipped back into it. There's something about the syntax that my brain just can't hold onto.
Same here. Because I never need "regex" as a skill by itself in a project, but just needed to search/replace one damn string for one particular situation, and next year another situation in a completely unrelated place for completely unrelated reasons so whatever I learned (and forgotten) last year wouldn't really apply anyway.
I'm the same way. I suppose some people use regex's a lot, I never really have.

If our experiences are typical, I'd argue people new to regex's should more learn they exist and when they can be useful, and not worry too much about actually learning their mechanics.

This was my approach and I found it pretty successful the first and most recent time I used regex.
I think that's okay. It's the same for me. The important part is understanding the concept of regex enough to know when you need it.
I have some basics down pat (., +, *, ?, ^, $, [], [^], (), \d) and anything more I always have to look up (and much of it differs between engines anyway). Usually, though, what takes more time to figure out than the regex itself is whatever unholy combination of escaping rules is in force in the place where I have to use it.
I'm with you - if I'm doing a quick find/replace in VSCode then I can usually muddle a way through but anything more complex leaves me somewhat at a loss.
That course looks great and I'm sure that at the end of if I will be absolutely rocking at regex. Then I won't use it for 4 months and need it for something and all but the basics will be gone. I'll essentially have to relearn it each time. I don't think it's the learning material that's the problem, I've even done interactive Jupyter training, there's something about Regex itself that just won't stick.
I believe part of the problem are the slight syntax variations in different languages/environment. I've been writing regexes in Unix utilities (vim, grep/sed) for the most part and when I found an answer on SO about a specific problem I was trying to solve, it was in JS. I barely understood what was being written to the point I stopped bothering "transcribing" it to UNIX syntax.
Why use it though? I'd rather wright or read 30 lines of code instead of trying to decipher what a regex means or do and making sure it is covering all possible cases.

I don't understand the goodwill toward regexes. It's basically an embedded BrainFuck in your programming language.

I'm 14 years in my dev career, there never was a moment where not using regexes came to be a problem.

> Why use it though?

Because a well written regex performs extremely well (regex engines are often very highly optimized).

It gives you all the benefits of using a domain-specific language and using an extremely mature software library. Just like a domain-specific language, it will have a baked-in philosophy involving the exact task you want to accomplish, so it will not suffer from language vs algorithm impedence. Just like using a mature library, it will probably have accounted for weird oddball cases that you're not even thinking of and have enough features to do everything you will want.

> It's basically an embedded BrainFuck in your programming language.

I don't disagree. It's not easy to read and can be hard to maintain. There are ways to write regex such that it's easier to understand, but the syntax generally doesn't make it easy to do that and doesn't encourage you to spend the time on it.

However, when you see a regex, you do know that it's 100% used to manipulate strings. That alone tells you quite a bit about what is going on.

I think it's case-by case. I once scraped a list of names and dates from individual Wikipedia pages. There were lots of formats like "1900-1950", "1900 - 1950", "(1900 - 1950)", "(1900 to 1950)", "1900 to cf. 1950", and so on. These were arbitrarily nested in the first couple sentences.

My thought was "Oh I think this is a job for that regex thing" and 35 minutes of googling syntax + a handful of passes later I had all the dates in a workable table. I have no idea how much code that would have taken. Albeit, I am a novice programmer.