| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jll29 779 days ago
	Microsoft Interne Explorer (no, I'm not using it personally) had a file format called .mht that could save a HTML page together with all the files referenced from it like inline images. I believe you could not store more than one page in one .mht file, though, so your work could be seen as an extension. Although UNIX philosophy posits that it's good to have many small files, I like your idea for its contribution to reduceing clutter (imagine running 'tree' in both scenarios) and also avoiding running out of inodes in some file systems (maybe less of a problem nowadays in general, not sure as I haven't generated millions of tiny files recently).

4 comments

jdougan 779 days ago

.mht us alive and well. It is a MIME wrapper on the files and is generated by Chrome, Opera, and Edge's save option "Webpage as single file" and defaults to an extension of .mhtml.

When I last looked Firefox didn't support it natively but it was a requested feature.

link

venusenvy47 778 days ago

I use SingleFile on Firefox quite often for this purpose. https://addons.mozilla.org/en-US/firefox/addon/single-file/

link

rrr_oh_man 779 days ago

> When I last looked Firefox didn't support it natively but it was a requested feature.

That sounds familiar, unfortunately

link

jdougan 779 days ago

There are firefox plug-ins that claim to support saving as mhtml, I have no experience with them.

link

lukan 778 days ago

I use it regulary. It works on static sites quite well, but subsites are not automaticaly saved, so not crawled.

link

felipefar 779 days ago

Unfortunately it's not supported by Safari either.

link

unlog 779 days ago

Yes! You know, I was considering this the previous couple of days, was looking around on how to construct a `mhtml` file for serving all the files at the same time. Unrelated to this project, I had the use case of a client wanting to keep an offline version of one of my projects.

> Although UNIX philosophy posits that it's good to have many small files, I like your idea for its contribution to reduceing clutter (imagine running 'tree' in both scenarios) and also avoiding running out of inodes in some file systems (maybe less of a problem nowadays in general, not sure as I haven't generated millions of tiny files recently).

Pretty rare for any website to have many files, as they optimize to have as few files as possible(less network requests, which could be slower than just shipping a big file). I have crawled react docs as a test, and it's a zip file of 147mb with 3.803 files (including external resources).

https://docs.solidjs.com/ is 12mb (including external resources) with 646 files

link

mikeqq2024 777 days ago

trying to use this for mirroring a document site. disappointed at 1. it running quite slow, 2. it kept outputing error messages like "ProtocolError: Protocol error (Page.bringToFront): Not attached to an active page". not sure what reason

link

unlog 777 days ago

If the URL is public you may post it here or in a GitHub issue, so I can take a look to what's wrong with it.

link

mikeqq2024 776 days ago

not reproduce it, but 'wget -m --page-requisites --convert-links <url>' did a good job for me. never mind

link

ajvs 779 days ago

SingleFile extension is the modern equivalent these days.

link

marban 778 days ago

I just opened an .mht file from 2000 on Edge/Mac the other day and it displayed just fine.

link