Hacker News new | ask | show | jobs
by tams 1799 days ago
I'm pretty fond of using this tool to take trips down memory lane, revisiting lost content I used to enjoy.

Browsing through crawls has this neat side-effect of being able to serendipitously discover things that I missed back in the day just by having everything laid out on the file system.

PSA: There's a lot of holes in most crawls, even for popular stuff. A good way to ensure that you can revisit content later is submitting links to the Wayback Machine with the "Save Page Now" [1] functionality. Some local archivers like ArchiveBox [2] let you automate this. Highly recommended to make a habit of it.

[1] https://web.archive.org/

[2] https://github.com/ArchiveBox/ArchiveBox

1 comments

Another convenient way to interact with "Save Page Now" is just to email a bunch of links to the savepagenow address at archive.org. I especially like to copy all the HTML of a page and paste it into a HTML email to get all the links.
There are two things to note, neither of which are well-advertised:

1. The parent comment you're replying to links to the main page for the Wayback Machine, which includes a Save Page Now widget, but Save Page Now actually has a dedicated page <https://web.archive.org/save/>

2. If you have an archive.org account (lets you submit and comment on collections; the library is bigger than just the Wayback Machine) and you visit the Save Page Now page while logged in, you get more options, including the option "Save outlinks"

Yeah, I use that API from the browser, I found the bulk asynchronous zero-download email API more convenient, since for a while, the save API stopped supporting HEAD requests, although it seems to support it again now.