Hacker News new | ask | show | jobs
by skwirl 303 days ago
The database is being reverse engineered and published anyways, as per the article.
1 comments

I think Archive is just rehydrating shortened links in webpages that have been archived. I doubt They’re discovering previously unknown urls.
No they really are trying to enumerate all 230 billion possible shortlinks; that’s why they need so many people to help crawl everything.
Got a source? I don’t see details one way or another
From the article:

> there are about 230 billion* links that need visiting

> * Thanks to arkiver on the Archive Team IRC for correcting this number.

Also when running the Warrior project you could see it iterating through the range. I don't have any logs handy since the project is finished but they looked a bit like

  https://goo.gl/gEdpoS: 404 Not Found
  https://goo.gl/gEdpoT: 404 Not Found
  https://goo.gl/gEdpoU: 302 Found -> https://...
  https://goo.gl/gEdpoV: 404 Not Found