|
|
|
|
|
by santa_boy
2306 days ago
|
|
I'm using something similar I believe. I simply wrote a puppeteer automated browser to go through every page and saves it as `.mhtml`
This work quite well for my purpose. I was archiving a site with content that I pay for and sits behind my login.
I often use material from it when I'm offline and hence needed to put together this hack. The below code does the job of saving the page as a single file. ``` const page = await this.browser.newPage()
const response = await page.goto(url, { timeout: 50000 })
if (response.status() === 404) {
await page.close()
throw new Error('not found')
}
// credit: https://stackoverflow.com/questions/54814323/puppeteer-how-to-download-entire-web-page-for-offline-use
const cdp = await page.target().createCDPSession();
const { data } = await cdp.send('Page.captureSnapshot', { format: 'mhtml' });
const htmlFilename = "./data/" + slugify(url)+'.mhtml';
fs.writeFileSync(htmlFilename, data);
``` |
|