Hacker News new | ask | show | jobs
by flashman 3230 days ago
How well does it work with multiple-page PDFs? One of our banes is generating mixed text/image downloadable reports with sensible page breaks. To save time, we're actually doing those as docx files, with the bonus/risk that clients can edit the content before saving it as a PDF.
3 comments

The holy grail of PDF generators, sensible page breaks. Never been done and makes peace in the middle east seem like an easy task.
PrinceXML does a solid job of it, and has for years.
Just use WkhtmlToPdf https://wkhtmltopdf.org and wrap a simple service around it.
That's one hell of a "just".
I did it in 2 days. Its not very hard.
DonnyV don't you know we need 15 million of the same things in technology :)
Wkhtmltopdf has basically not been updated in years aside from minor bug fixes. It has major issues the author has no plans to fix. I've spend 100s of hours applying workarounds to legacy codebases. All that code could be refactored now. Phantomjs and wkhtmltopdf don't even support doing $(.htmlE).width() from JavaScript. This can complicate laying out the page needless to say. https://github.com/wkhtmltopdf/wkhtmltopdf/issues/2419
Why are you doing layout code in javascript? If it can't be done using CSS then your doing something wrong. This is being used to generate PDFs of exact width and height size documents. Hard code the width and height of your page.
wkhtmltopdf is a solid tool but its mostly abandoned now. rendering in chrome headless is way faster and more accurate. chrome headless though is lacking a lot of features that wkhtmltopdf provides like headers/footers
Headers/footers are 2 huge features. Plus all of the other small features wkhtmltopdf has. It may not have all the new bells and whistles but it works reliably.
There's CSS that's supposed to help with doing this manually, right? Or does your docx export handle this auto-magically? (If so, I'm interested in more details!)

https://developer.mozilla.org/en-US/docs/Web/CSS/page-break-...

CSS isn't really that flexible though, say you want a repeated header or footer on each page, you are going to have a real hard time getting it to work perfectly. And good luck if you then need to internationalize it and support different document sizes (A4 vs legal). It's not impossible, but it's a lot more work than it should be.

One of the best ways I've used to generate PDFs is by using a DOCX as a template, and replacing certain placeholders within the document (a DOCX is a ZIP containing a few XML files). It's great if you work in a corporate environment, as it's easy for non-technical staff to make it look exactly how they want and it's easy to update (just replace a file and check it works). You can use headless LibreOffice to convert DOCX to PDF.

I was wondering why OP stopped at DOCX without taking it from there to PDF. The suggestion of a template document is a very practical tip; thanks!