Hacker News new | ask | show | jobs
by ddorian43 1319 days ago
Crawling should be the easiest part.
1 comments

I don't know if there is an easy part in search. Almost every aspect of it has unique challenges.

Large scale crawling is primarily a challenge in balancing the logistics in a way that is kind to both the crawler and the data consumers.

Distributed crawling, if you go that way, is also non-trivial as you're effectively juggling a shared rapidly mutating state in the dozens gigabytes.