Hacker News new | ask | show | jobs
by jll29 732 days ago
Microsoft Interne Explorer (no, I'm not using it personally) had a file format called *.mht that could save a HTML page together with all the files referenced from it like inline images. I believe you could not store more than one page in one *.mht file, though, so your work could be seen as an extension.

Although UNIX philosophy posits that it's good to have many small files, I like your idea for its contribution to reduceing clutter (imagine running 'tree' in both scenarios) and also avoiding running out of inodes in some file systems (maybe less of a problem nowadays in general, not sure as I haven't generated millions of tiny files recently).

4 comments

.mht us alive and well. It is a MIME wrapper on the files and is generated by Chrome, Opera, and Edge's save option "Webpage as single file" and defaults to an extension of .mhtml.

When I last looked Firefox didn't support it natively but it was a requested feature.

I use SingleFile on Firefox quite often for this purpose. https://addons.mozilla.org/en-US/firefox/addon/single-file/
> When I last looked Firefox didn't support it natively but it was a requested feature.

That sounds familiar, unfortunately

There are firefox plug-ins that claim to support saving as mhtml, I have no experience with them.
I use it regulary. It works on static sites quite well, but subsites are not automaticaly saved, so not crawled.
Unfortunately it's not supported by Safari either.
Yes! You know, I was considering this the previous couple of days, was looking around on how to construct a `mhtml` file for serving all the files at the same time. Unrelated to this project, I had the use case of a client wanting to keep an offline version of one of my projects.

> Although UNIX philosophy posits that it's good to have many small files, I like your idea for its contribution to reduceing clutter (imagine running 'tree' in both scenarios) and also avoiding running out of inodes in some file systems (maybe less of a problem nowadays in general, not sure as I haven't generated millions of tiny files recently).

Pretty rare for any website to have many files, as they optimize to have as few files as possible(less network requests, which could be slower than just shipping a big file). I have crawled react docs as a test, and it's a zip file of 147mb with 3.803 files (including external resources).

https://docs.solidjs.com/ is 12mb (including external resources) with 646 files

trying to use this for mirroring a document site. disappointed at 1. it running quite slow, 2. it kept outputing error messages like "ProtocolError: Protocol error (Page.bringToFront): Not attached to an active page". not sure what reason
If the URL is public you may post it here or in a GitHub issue, so I can take a look to what's wrong with it.
not reproduce it, but 'wget -m --page-requisites --convert-links <url>' did a good job for me. never mind
SingleFile extension is the modern equivalent these days.
I just opened an .mht file from 2000 on Edge/Mac the other day and it displayed just fine.