Hacker News new | ask | show | jobs
by toomuchtodo 1091 days ago
Crawl the internet archive Wayback machine slowly. Once you have built your index, you can fill your corpus gaps with public crawls. Patron services can advise to be helpful without causing stress on their infra.