Hacker News new | ask | show | jobs
Ask HN: Is there a service that offers Common Crawl as an API?
7 points by georgehill 401 days ago
I am trying to do some data analysis work. I don't want the full dataset. I want only two things: give me the hostname, and give me all the pages or URLs with their HTML.
2 comments

there's index.commoncrawl.org where you can ask for a domain with wildcards.
Not that I know of but there are various tools like https://github.com/alwalxed/wayurls
thank you will check this out