Hacker News new | ask | show | jobs
by mpweiher 2776 days ago
I've seen a bunch of these (HTML -> PDF). I've never seen a succinct answer to: "How is this different/better than taking <random web browser> and hitting "print", which at least on OS X will produce a nice PDF?"
5 comments

Prince is pretty powerful when it comes to print-specific stuff. We care about pagination, making tables look good across page breaks, footnotes, great justification, table of contents, non-sRGB color space handling, crop marks, etc. Also having great accessibility annotations (often mandatory for government documents). These are things that web browsers are less concerned with - print-to-PDF is more of an afterthought, where as for us it's our main area of focus.
That's not even close to what you get with a good HTML -> PDF export, which can include anything from proper pagination, page margins and TOCs, to orphans handling and other such concerns.
The Synfony project use princexml to generate their documentation (including The Book) and it's phenomenally good.

https://symfony.com/doc/current/index.html#gsc.tab=0

Select offline, The book, 4.2 and it generates the book on the fly.

I'm not the guy you asked but I've been using PrinceXML to produce PDFs intended for customers of our client (e.g. invoices, terms and conditions, itineraries, etc.). Sure, we could just display the HTML and let either the customer or the sales agent press "Print to PDF" but it's not very user friendly—non-power users may not know that "print to PDF" is even an option—nor is it particularly practical for batch processing.

Full disclosure: If I'd had my way we would have used LaTeX templates to produce the PDFs but the previous developers had already implemented the HTML->PDF flow, so we just replaced the old, defunct service with Prince, which did a surprisingly good job, IMO.

>Sure, we could just display the HTML and let either the customer or the sales agent press "Print to PDF" but it's not very user friendly—non-power users may not know that "print to PDF" is even an option—nor is it particularly practical for batch processing.

It's not just that. Print to PDF for basic stuff it can be an option. For complex documents, print workflows, etc, it's a non-starter.

> "How is this different/better than taking <random web browser> and hitting "print"

If you have to repeat this process 2000 times, it becomes time consuming. It doesn't scale for a single user needing 2000 pdfs to do the process manually.

You don't know Headless Browser modes. There are a bunch of scriptable CLIs.
I do know about headless browsers. The comment above mine mentioned a manual process, no scripts or headless browsers. a combo of Curl, wkhtmltopdf(or other html to pdf) and a for loop can perform this in a bash 1-liner.