Hacker News new | ask | show | jobs
by mikeday 2773 days ago
We use Mercury at YesLogic to write Prince, our HTML to PDF formatter! [1]

We chose it because logic/functional languages are great for tree processing, Mercury was designed for large projects, and because in 2002 there really weren't many other options around.

Its syntax and semantics are derived from Prolog, it borrows a lot from Haskell (types, type classes), in spirit it's reminiscent of OCaml (niche, little weird) and with support for unique modes there is some interesting overlap with Rust, although this aspect of the language still needs more compiler support.

All in all, definitely worth checking out.

[1] https://www.princexml.com/

3 comments

I've seen a bunch of these (HTML -> PDF). I've never seen a succinct answer to: "How is this different/better than taking <random web browser> and hitting "print", which at least on OS X will produce a nice PDF?"
Prince is pretty powerful when it comes to print-specific stuff. We care about pagination, making tables look good across page breaks, footnotes, great justification, table of contents, non-sRGB color space handling, crop marks, etc. Also having great accessibility annotations (often mandatory for government documents). These are things that web browsers are less concerned with - print-to-PDF is more of an afterthought, where as for us it's our main area of focus.
That's not even close to what you get with a good HTML -> PDF export, which can include anything from proper pagination, page margins and TOCs, to orphans handling and other such concerns.
The Synfony project use princexml to generate their documentation (including The Book) and it's phenomenally good.

https://symfony.com/doc/current/index.html#gsc.tab=0

Select offline, The book, 4.2 and it generates the book on the fly.

I'm not the guy you asked but I've been using PrinceXML to produce PDFs intended for customers of our client (e.g. invoices, terms and conditions, itineraries, etc.). Sure, we could just display the HTML and let either the customer or the sales agent press "Print to PDF" but it's not very user friendly—non-power users may not know that "print to PDF" is even an option—nor is it particularly practical for batch processing.

Full disclosure: If I'd had my way we would have used LaTeX templates to produce the PDFs but the previous developers had already implemented the HTML->PDF flow, so we just replaced the old, defunct service with Prince, which did a surprisingly good job, IMO.

>Sure, we could just display the HTML and let either the customer or the sales agent press "Print to PDF" but it's not very user friendly—non-power users may not know that "print to PDF" is even an option—nor is it particularly practical for batch processing.

It's not just that. Print to PDF for basic stuff it can be an option. For complex documents, print workflows, etc, it's a non-starter.

> "How is this different/better than taking <random web browser> and hitting "print"

If you have to repeat this process 2000 times, it becomes time consuming. It doesn't scale for a single user needing 2000 pdfs to do the process manually.

You don't know Headless Browser modes. There are a bunch of scriptable CLIs.
I do know about headless browsers. The comment above mine mentioned a manual process, no scripts or headless browsers. a combo of Curl, wkhtmltopdf(or other html to pdf) and a for loop can perform this in a bash 1-liner.
Prince is cool, I've used it 10 years ago or something. No fuss about that.

It's a bit pricey, though (at leats, pricer than "free"). So we're using WeasyPrint on more recent projects.

WeasyPrint is open source and written in Python. It's much slower than Prince, though, but this can be mitigated by caching renderings. I'm would bet that it's as standard-compliant or bug-free than Prince, but it's good enough for us.

When / if our customers ask for more speed or pixel-perfect support (with the $$$ to match), we will definitively try Prince again.

The Java class caught my eye. Is that a wrapper around a native lib or you make RPC calls to something?

HTML to PDF is something I never thought about since Firefox does it (and results usually aren’t great).

It's a wrapper around the native process just to simplify passing command-line arguments. (There is also a persistent process mode for speeding up batch processing of many small documents).

The browsers don't specialise in PDF generation, and we do :)