Hacker News new | ask | show | jobs
by ot 103 days ago
> are there eviction techniques to guard against this?

RE2 resets the cache when it reaches a (configurable) size limit. Which I found out the hard way when I had to debug almost-periodic latency spikes in a service I managed, where a very inefficient regex caused linear growth in the Lazy DFA, until it hit the limit, then all threads had to wait for its reset for a few hundred milliseconds, and then it all started again.

I'm not sure if dropping the whole cache is the only feasible mitigation, or some gradual pruning would also be possible.

Either way, if you cannot assume that your cache grows monotonically, synchronization becomes more complicated: the trick mentioned in the other comment about only locking the slow path may not be applicable anymore. RE2 uses RW-locking for this.

1 comments

I have experienced this as well, the performance degradation of DFA to NFA is enormous and while not as bad as exponential backtracking, it's close to ReDoS territory.

The rust version of the engine (https://github.com/ieviev/resharp) just returns an Error instead of falling back to NFA, I think that should be a reasonable approach, but the library is still new so i'm still waiting to see how it turns out and whether i had any oversights on this.

Here RE2 does not fall back to the NFA, it just resets the Lazy DFA cache and starts growing it again. The latency spikes I was mentioning are due to the cost of destroying the cache (involving deallocations, pointer chasing, ...)
Ah, sorry then i misunderstood the comment

I'm not sure if it's with both RE2 or Rust, but some internal engines of Rust appear to allocate a fixed buffer that it constantly re-creates states into.

I'm not really familiar with the eviction technique of RE2 but I've done a lot of benchmark comparisons. A good way to really stress test RE2 is large Unicode classes, \w and \d in RE2 are ascii-only, i've noticed Unicode (\p{class}) classes very drastically change the throughput of the engine.