|
|
|
|
|
by azornathogron
151 days ago
|
|
Is crawling really solved? Any naive crawler is going to run into the problem that servers can give different responses to different clients which means you can show the crawler something different to what you show real users. That turns crawling into an antagonistic problem where the crawler developers need to continually be on the lookout for new ways of servers doing malicious things that poison/mislead the index. Otherwise you'll return junk spam results from spammers that lied to the crawler. I've never done it so maybe it's easier than I imagine but I wouldn't be quick to assume that crawling is solved. |
|
But my impression is that it's more a question of scale and engineering time than having to invent something new.
(disclaimer: I also never worked on a internet-scale search system, maybe I'm very off the bat here as well).