Hacker News new | ask | show | jobs
by randomstring 1983 days ago
I would estimate that greater than 90% of that traffic is bot traffic. Having run two web search engines in the past: search.netscape.com (pre-google) and blekko.com. Robots accounted for > 80% of traffic at Netscape (around 3M searches/day in 2000 IIRC) and definitely more than 80% at blekko. Maybe 90% or more. Some traffic is obviously bot traffic (single source IP, common patterns, obvious bot useragents) and then there's the non-obvious bot traffic that is random-ish, but in aggregate is clearly bot traffic. For instance, way too many queries matching the pattern "(mortgage|home loans) (zip|county|city|state)" even if they are coming from random IPs and user agents.

At blekko, under high traffic, we would loadshed obvious bot traffic first and prioritize searches from humans.

2 comments

At Mojeek which also has it's own crawler/index, like Blekko, we see around 80% bot searches. That % has also grown as we grow.

Would syndication search services, like DDG or Startpage, see a higher level of bot traffic than a crawler search engine? We don't know and it could depend on several factors. Bot traffic handling is certainly an ongoing and important challenge.

This is a fascinating trend - how could you even launch a search service today where that portion of your processing time is going to be spent powering bots (ad agencies?) and not addressing human queries.

Wonder what a human-centric 'search' experience will look like in 2025... no more search bar, pre-emptive article fetch based on whatever some ML algorithm decides for you?

This reminds me of someone saying that eventually Amazon will just start sending you things and charging you for them with no intervention on your end.

It'll just happen to be what you needed at the right time.

This is both horrifying and tantalizing
Once they can deliver my groceries as well right when I need them, I might be willing to sacrifice my privacy.
Exactly. If you can accurately ignore 50% of the bot traffic you halve your hardware expense. The trick is doing it in such a way that the bots don't notice.