| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by slashdot2008 1808 days ago
	it would be nice if it were a PDF I could download and save for later. is there any way to turn a series of pages in to a PDF? like a recursive wget and then pipe through pandoc?

5 comments

seanhunter 1808 days ago

Lots of ways to do this, but one way is install poppler-utils so you get pdfunite, make sure your filenames for the pages lexicographically sort in the order you want the pages to end up[1], then do

    pdfunite page*.pdf output.pdf

I have had decent results using pdftk as well to do pdf surgery so that's another option.

In this case, if you do a recursive wget I think it should "just work" because the files are named in a friendly way.

So, putting it all together:

     wget -r 'https://dropbox.github.io/dbx-career-framework/overview.html'
     cd dropbox.github.io/dbx-career-framework
     ls ic*software*.html | sed 's/.html$//' | while read f ; do
          pandoc --pdf-engine=wkhtmltopdf $f.html -o $f.pdf
     done
    pdfunite ic*.pdf output.pdf

[1] ie the ordering of the output of "ls" is the order you want the pages in the output pdf

link

mahalol 1808 days ago

A bit more manual, but I've been saving webpages I like in Obsidian.

First, click the reader view in Firefox, then select all, then paste it into a new Obsidian page. It's really good at keeping a nice formatting and importing pictures etc. You can then export the result to PDF if so desired.

link

liketochill 1808 days ago

Not sure why you are downvoted, I save everything I want to refer to again as pdf because stuff on the web disappears. I can search all the pdf I have offline with Qiqqa or mendeley. Used to use google desktop for pdf search.

link

zaptheimpaler 1808 days ago

Check out https://archivebox.io/ for a great self hosted solution. Its one of the best such programs I've found.

You can hack together some scripts to do the basics yourself, but archiving arbitrary pages is pretty difficult to get right.

link

qbasic_forever 1808 days ago

Print to PDF, most browsers support it natively if your OS doesn't already.

link

johntash 1808 days ago

Unless I'm mistaken, this doesn't print pages recursively

link