Hacker News new | ask | show | jobs
by seanhunter 1808 days ago
Lots of ways to do this, but one way is install poppler-utils so you get pdfunite, make sure your filenames for the pages lexicographically sort in the order you want the pages to end up[1], then do

    pdfunite page*.pdf output.pdf
I have had decent results using pdftk as well to do pdf surgery so that's another option.

In this case, if you do a recursive wget I think it should "just work" because the files are named in a friendly way.

So, putting it all together:

     wget -r 'https://dropbox.github.io/dbx-career-framework/overview.html'
     cd dropbox.github.io/dbx-career-framework
     ls ic*software*.html | sed 's/.html$//' | while read f ; do
          pandoc --pdf-engine=wkhtmltopdf $f.html -o $f.pdf
     done
    pdfunite ic*.pdf output.pdf
[1] ie the ordering of the output of "ls" is the order you want the pages in the output pdf