Hacker News new | ask | show | jobs
by pooper 867 days ago
How do you crawl the web? Do you follow links around? How do you reach a page that isn't linked from anywhere you've crawled?
2 comments

I'm just using common crawl for now
I mean that's what web crawling is, right? By extension, you just can't reach a page unless you stumble upon a link to it _somewhere_. Google gives you an option to submit a link and schedule a crawl that way, so that's another option if it's not being linked to from anywhere.