Hacker News new | ask | show | jobs
by Springtime 1225 days ago
I save everything I consider useful as an MHTML file (HTML/CSS/images in one file), which is native to Chromium browsers. This has the benefit of the page not being split into separate files as the normal save page does, doesn't break when the file is renamed like the normal save page does (on Windows at least), removes the need for bookmarks which go dead over time, preserves the original source URL in the file for later reference.

There are over 10k of such files I've saved in this manner. With practice it becomes second nature to categorize and tag them just using the filename, which makes them findable within seconds.

These have become like my own private search engine, without the issue of not being able to find answers to queries you know exist (which increasingly has become an issue with online search engines).

In Chromium the saving to MHTML feature is enabled by launching the browser with the CLI argument `--save-page-as-mhtml` (Vivaldi browser enables this by default without any arguments needed). Firefox used to support it via excellent addons up until their Quantum update but they haven't supported it since and is a dealbreaker to my daily use of it.

4 comments

MHTML is almost as old as Google and was first implemented by Internet Explorer in 2001. They're identical to .eml files too (it's plain old MIME), so you can load them in your email client if you're so inclined.
Not MHTML I guess and not native to the browser, but in Firefox you can use the popular SingleFile addon to save any web page into a single HTML file.

https://addons.mozilla.org/en-GB/firefox/addon/single-file/

Seems like a decent project, which has come up before. The sibling commenter ramraj07 would likely be interested in what seems like a new feature listed to automatically save pages once loaded, which according to that link works on Firefox for Android (mobile).

Back when Firefox supported the UnMHT addon it would also work on Firefox for Android and had the nice feature of being able to customize which replacement Unicode was used for illegal filename characters (instead of just using underscores like Chromium does and SingleFile says it does).

Btw the comparison table of SingleFile mentions MHTML can't be 'unzipped' to extract the resources but there is an open source program on Windows which can extract the resources called ExtractMHT[1], which later became bundled with Universal Extractor.

[1] https://www.legroom.net/software/extractmht

MHTML looks interesting. Avoiding "Save as" into multiple files, I've been capturing the websites as screenshots to preserve the layout and printing as PDFs to preserve the content, and honestly it works so-so...
I want this but done automatically (save every webpage I spend more than 10 seconds in) and also work on my phone. Which is impossible I suppose.
There are some 'save everything' programs/extensions I seen mentioned in the past (and in this topic) but have wondered how users deal with the signal-to-noise ratio, later filtering for desired content, or whether the users find the storage requirements are worth it.

Often when searching for a specific thing, be it technical queries, useful reviews, troubleshooting, etc, it can take many, many searches and looking through pages before finding something worthwhile, at which point I'll save it (and name it with keywords for future me) since the effort required to even find it is not worth repeating. I'm not sure how such save everything utilities would make that part easier when I wish to return to the relevant page in the future (ie: finding the relevant page, among all the pages that aren't relevant when they share so many similar internal keywords).