Hacker News new | ask | show | jobs
by mycall 1612 days ago
I wish web.archive.org had an index by someone like common crawl. There is lots of great stuff on archive.org
2 comments

web.archive.org has a CDX index, similar to Common Crawl.

Since I use both of these archives together, I wrote this code to iron out the differences between them:

https://github.com/cocrawler/cdx_toolkit

Hey! I was using your tool a couple months ago. It was super helpful for my project.
Thanks! I rarely hear from users, great to hear from you!
They do and its better than common crawl's by my testing.