Hacker News new | ask | show | jobs
by btown 302 days ago
As someone who's been saved by look-aheads in many a situation, I'm quite partial to the approach detailed in [0]: use a regex library that checks for a timeout in its main matching loop.

This lets you have full backwards compatibility in languages like Python and JS/TS that support backreferences/lookarounds, without running any risk of DOS (including by your own handrolled regexes!)

And on modern processors, a suitably implemented check for a timeout would largely be branch-predicted to be a no-op, and would in theory result in no measurable change in performance. Unfortunately, the most optimized and battle-tested implementations seem to have either taken the linear-time NFA approaches, or have technical debt making timeout checks impractical (see comment in [0] on the Python core team's resistance to this) - so we're in a situation where we don't have the best of both worlds. Efforts like [1] are promising, especially if augmented with timeout logic, but early-stage.

[0] https://stackoverflow.com/a/74992735

[1] https://github.com/fancy-regex/fancy-regex