Hacker News new | ask | show | jobs
by jdrock 6188 days ago
Yep - that's exactly it. Setting up the infrastructure to handle large, web-scale content analysis is the real challenge. (Shameless plug alert) That's why we setup 80legs: to help everyone not called Google/Yahoo/Microsoft to have comparable capabilities when it comes to this.
2 comments

I'd hate to see your power & network bill :)

But I think that once you have enough customers the cost of 'crawling' goes down for every new customer you sign up because you only need to crawl a page once and you can sell the crawled result to many customers. Or do I misread your model and is every page crawled over and over again for every user ?

Right now we crawl again for each user, but as we scale up we're going to start doing some caching and providing data streams.