Hacker News new | ask | show | jobs
by qntm 5064 days ago
> How about we just make a "blacklist" of these known bots, look up every user agent, and compare against the blacklist? So now every single request to your site has to do a substring match against every single term in this list. Depending on your site's implementation, this is probably not trivial to do without taking some sort of performance hit.

Build a finite state machine which only accepts the terms in the blacklist. That should be a one-time operation.

Then feed each request into the FSM and see if you get a match. Execution time is linear in the length of the request, regardless of the number of terms in the blacklist.

1 comments

A perl regular expression will test for a match in less than a microsecond. Other scripting languages have a similar speed.

There's no need to program like it's 1985 any more.

Hey, you show me a performance problem, I show you a solution. Preferably one involving finite state machines.