Hacker News new | ask | show | jobs
by supriyo-biswas 867 days ago
Regular expressions are probably not a good example because you will eventually write a regex that has catastrophic backtracking behavior. I encountered one a few months ago, so it’s not at all uncommon or difficult to encounter. If you’re curious enough, you’d end up reading about how regexes work under the hood.

A better example might be that of a compiler, where you (very rarely) need to look at the asm output or encounter a case where the compiler generates incorrect code and you need to debug why.

1 comments

> Regular expressions are probably not a good example because you will eventually write a regex that has catastrophic backtracking behavior.

Nope, that will never happen to me. No regular expression has that behaviour.

There are some bad implementations of regular expression matching that have these problems for some regular expressions. But you can avoid those bad implementations without understanding anything.

> I encountered one a few months ago, so it’s not at all uncommon or difficult to encounter.

That only happens, if you use a regular expression matcher that uses backtracking. Sane regular expression matchers take linear time on all input strings.

> If you’re curious enough, you’d end up reading about how regexes work under the hood.

There's no one single way regular expressions work under the hood. You had the misfortune of using a matcher that uses backtracking and is prone to catastrophic exponential runtimes.

There's multiple different ways to implement regular expression matchers. But a user doesn't have to care or understand anything (they just need to avoid the buggy ones).

Sure, if you read up on how regular expressions work under the hood, you can learn to avoid those bad matchers. (Or you can learn how to live with your batch implementation, if you are feeling masochistic.)

But that's entirely optional: You only need to read one blog post that tells you to use grep and avoid eg Perl. You don't need to understand why backtracking is bad for regular expression matching; as long as you avoid those bad matchers.

I would say that knowing that there are such things as different families of regex compiler, what their internal methodologies and consequences are, so that you can avoid certain otherwise non-obvious problems that arise not from something you did wrong, but something a layer lower did wrong (you wrote a valid regex according to all the rules in the manual, and got a bad result) qualifies exactly as an example of needing to have some understanding of how the layers below work.

In fact it wasn't necessary for me to qualify that with "I would say". I do say, and it simply does. You've made exactly no argument this entire thread.

Maybe everyone doesn't need to know everything, but the skin of stuff anyone needs to know is thicker than 0, and has no absolute boundary layer either, you just generally need to know less the further from your own work. But that never goes all the way to 0. You have to rely on other people to have done their jobs, but you still at least have to understand what those jobs are, that they exist and how you ultimately interact with them and how they impact you.

You said so yourself several times which makes this all farcical.

I’m not sure if this response was simply for the sake of replying, with the claims of writing perfect code all the time or how backtracking implementation is uncommon or insane (most languages use backtracking implementations of regexes).
Huh? I'm not making any claims of writing perfect code. If you have a sane regular expression matcher, there are no catastrophic cases to avoid.

Regular expression libraries that don't use backtracking are available for many languages. Yes, some languages have bad libraries that use backtracking, too. But bad libraries will always exist, they aren't an excuse when there are more good libraries around then ever before.