Hacker News new | ask | show | jobs
by BenFeldman1930 1099 days ago
When I want to preserve online content, the first step is to take it offline by storing a local copy. Usually by using something like Reader Viewer and exporting it to PDF. That sometimes doesn't look as good as the website, but content is more important. The directory is indexed by `recoil`, so I can later access this and similar content.
5 comments

As I've noted, the Einkbro web browser (optimised for e-ink, though usable on emissive-display devices as well, Android only) has a "save to ePub" feature.

This does, of course, strip all site styling. Images / artwork may be preserved, depending on how they are presented in the original HTML. (If using <img> tags, yes, if served through JS ... not so much.)

What's genius, though, is that multiple articles can be saved to a single document, which can be appended over time.

I use this to group items by similar topic, project, or interest, or to create a "BOTI" archive, where "BOTI" stands for "best of the interval". The idea here is to select the best items over a given period of time (week, month, year, etc., though I seem to be settling on six to twelve months), and organise those in a single place. You've Still Got to Read It[tm], but at least you've organised the material.

Reading this on an e-book reader, using the bookreader software, is also virtually always preferable to reading the same content online in a Web browser.

I just use ArchiveBox, really. It has replaced all my “read later” tools and I can extract the PDFs for annotations.
I do this as well, but I don't like setting it to Reader Mode because the site design disappears. Unfortunately it's necessary because "print to PDF" will obscure text due to including the stupid sticky nav bars and cookie notices in the PDF.
Onenote can be good for this (if you're OK with Microsoft) - the Clip to Onenote browser extension is pretty decent, and it save the page content, link, and time of capture.
My favorite is the "Save as MHTML" Chrome extension (the one with a check box).