Y
Hacker News
new
|
ask
|
show
|
jobs
Ask HN: Is there a service that offers Common Crawl as an API?
7 points
by
georgehill
401 days ago
I am trying to do some data analysis work. I don't want the full dataset. I want only two things: give me the hostname, and give me all the pages or URLs with their HTML.
2 comments
pluto_modadic
389 days ago
there's index.commoncrawl.org where you can ask for a domain with wildcards.
link
phillipseamore
401 days ago
Not that I know of but there are various tools like
https://github.com/alwalxed/wayurls
link
georgehill
401 days ago
thank you will check this out
link