| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tams 1846 days ago

I'm pretty fond of using this tool to take trips down memory lane, revisiting lost content I used to enjoy.

Browsing through crawls has this neat side-effect of being able to serendipitously discover things that I missed back in the day just by having everything laid out on the file system.

PSA: There's a lot of holes in most crawls, even for popular stuff. A good way to ensure that you can revisit content later is submitting links to the Wayback Machine with the "Save Page Now" [1] functionality. Some local archivers like ArchiveBox [2] let you automate this. Highly recommended to make a habit of it.

[1] https://web.archive.org/

[2] https://github.com/ArchiveBox/ArchiveBox

1 comments

pabs3 1846 days ago

Another convenient way to interact with "Save Page Now" is just to email a bunch of links to the savepagenow address at archive.org. I especially like to copy all the HTML of a page and paste it into a HTML email to get all the links.

link

cxr 1845 days ago

There are two things to note, neither of which are well-advertised:

1. The parent comment you're replying to links to the main page for the Wayback Machine, which includes a Save Page Now widget, but Save Page Now actually has a dedicated page <https://web.archive.org/save/>

2. If you have an archive.org account (lets you submit and comment on collections; the library is bigger than just the Wayback Machine) and you visit the Save Page Now page while logged in, you get more options, including the option "Save outlinks"

link

toomuchtodo 1845 days ago

You can also kick off retrievals from the command line:

https://github.com/pastpages/savepagenow

https://github.com/overcast07/wayback-machine-spn-scripts

link

pabs3 1845 days ago

Yeah, I use that API from the browser, I found the bulk asynchronous zero-download email API more convenient, since for a while, the save API stopped supporting HEAD requests, although it seems to support it again now.

link