|
|
|
|
|
by a5huynh
1431 days ago
|
|
For those looking for an alternative to that, I've been building a self-hosted search engine that crawls what you want based on a basic set of rules. It can be a list of domains, a very specific list of URLs, and/or even some basic regexes. https://github.com/a5huynh/spyglass |
|
Which raises the question: does archive.org offer their Wayback Machine index for download anywhere? Technically, why should anyone go through the trouble of crawling the web if archive.org has been doing it for years, and likely has one of the best indexes around? I've seen some 3rd-party downloaders for specific sites, but I'd like the full thing. Yes, I realize it's probably petabytes of data, but maybe it could be trimmed down to just the most recent crawls.
If there was a way of having that index locally, it would make a very powerful search engine with a tool like yours.