Hacker News new | ask | show | jobs
by gildas 2308 days ago
SingleFile [1] can do that, i.e. automatically save viewed pages and/or bookmarks, but it will save pages in HTML. Alternatively, SingleFileZ [2] can do the same but will produce zip files (disguised as HTML files). Disclosure: I'm the author.

[1] https://github.com/gildas-lormeau/SingleFile

[2] https://github.com/gildas-lormeau/SingleFileZ

2 comments

I'm using something similar I believe. I simply wrote a puppeteer automated browser to go through every page and saves it as `.mhtml` This work quite well for my purpose. I was archiving a site with content that I pay for and sits behind my login. I often use material from it when I'm offline and hence needed to put together this hack.

The below code does the job of saving the page as a single file.

```

        const page = await this.browser.newPage()
        const response = await page.goto(url, { timeout: 50000 })

        if (response.status() === 404) {
            await page.close()
            throw new Error('not found')
        }

        // credit: https://stackoverflow.com/questions/54814323/puppeteer-how-to-download-entire-web-page-for-offline-use
        const cdp = await page.target().createCDPSession();
        const { data } = await cdp.send('Page.captureSnapshot', { format: 'mhtml' });

        const htmlFilename = "./data/" + slugify(url)+'.mhtml';
        fs.writeFileSync(htmlFilename, data);
```
I've been looking for something like this for a while now, to store all pages I visit into a personal archive, but all the options I found either involved setting up a proxy and MITMing all your requests (too much effort to set up) or saved to a format I could not easily access.

So far, SingleFile looks like a perfect fit, thanks!