Hacker News new | ask | show | jobs
by kentonv 3020 days ago
Yeah, trouble is, it's depressingly easy to create a regex that performs very poorly, so if we allowed that, people could easily break the service.

However! One of the neat things about Workers is that you totally can use regexes in JavaScript, and our sandbox prevents runaway CPU usage. So if you really wanted regex-based page rules before, you can probably get that with a Worker.

We've exposed an API for controlling Cloudflare features from a Worker, including many things commonly controlled via page rules:

https://developers.cloudflare.com/workers/reference/cloudfla...

We'll be adding more in the future.

1 comments

> a regex that performs very poorly

Would that still be true if you limited the regex support to DFAs instead of NFAs? I can’t think of a single use-case for backreferences in routing patterns. :)

Good idea; a nit on terminology:

NFAs and DFAs have equivalent expressive power[1]. For this application, you probably want a determinized DFA (one without epsilon transitions); that's constant time on the input string. (It's still not as fast as Boyer-Moore.)

Backrefs let you write PDAs, which are more powerful than either NFAs or DFAs.

PCRE2[2] implements DFA-based matching. The cited page says it's slower than the NFA algorithm (although presumably a lower complexity order); I think because of lookaheads and capturing.

[1] Michael Sipser, _Introduction to the Theory of Computation_.

[2] http://www.pcre.org/current/doc/html/pcre2matching.html

> determinized DFA

Yep, that's the thing I was thinking of. Thanks!

Question, since you seem like you might know: is there an alternative regex library—or a plain Yacc parsing grammar that you could stick as a validator in front of PCRE's parser—that would allow through only the regexes which parse out to constant-time (determinized) DFAs?

Not an expert in this, but I'm thinking this is usually handled by switching to a non-backtracking regexp engine. Otherwise, any such validator would depend on details of the regexp engine's implementation.