| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mikeday 2820 days ago

We use Mercury at YesLogic to write Prince, our HTML to PDF formatter! [1]

We chose it because logic/functional languages are great for tree processing, Mercury was designed for large projects, and because in 2002 there really weren't many other options around.

Its syntax and semantics are derived from Prolog, it borrows a lot from Haskell (types, type classes), in spirit it's reminiscent of OCaml (niche, little weird) and with support for unique modes there is some interesting overlap with Rust, although this aspect of the language still needs more compiler support.

All in all, definitely worth checking out.

[1] https://www.princexml.com/

3 comments

mpweiher 2820 days ago

I've seen a bunch of these (HTML -> PDF). I've never seen a succinct answer to: "How is this different/better than taking <random web browser> and hitting "print", which at least on OS X will produce a nice PDF?"

link

bjz_ 2820 days ago

Prince is pretty powerful when it comes to print-specific stuff. We care about pagination, making tables look good across page breaks, footnotes, great justification, table of contents, non-sRGB color space handling, crop marks, etc. Also having great accessibility annotations (often mandatory for government documents). These are things that web browsers are less concerned with - print-to-PDF is more of an afterthought, where as for us it's our main area of focus.

link

coldtea 2820 days ago

That's not even close to what you get with a good HTML -> PDF export, which can include anything from proper pagination, page margins and TOCs, to orphans handling and other such concerns.

link

noir_lord 2820 days ago

The Synfony project use princexml to generate their documentation (including The Book) and it's phenomenally good.

https://symfony.com/doc/current/index.html#gsc.tab=0

Select offline, The book, 4.2 and it generates the book on the fly.

link

lillesvin 2820 days ago

I'm not the guy you asked but I've been using PrinceXML to produce PDFs intended for customers of our client (e.g. invoices, terms and conditions, itineraries, etc.). Sure, we could just display the HTML and let either the customer or the sales agent press "Print to PDF" but it's not very user friendly—non-power users may not know that "print to PDF" is even an option—nor is it particularly practical for batch processing.

Full disclosure: If I'd had my way we would have used LaTeX templates to produce the PDFs but the previous developers had already implemented the HTML->PDF flow, so we just replaced the old, defunct service with Prince, which did a surprisingly good job, IMO.

link

coldtea 2820 days ago

>Sure, we could just display the HTML and let either the customer or the sales agent press "Print to PDF" but it's not very user friendly—non-power users may not know that "print to PDF" is even an option—nor is it particularly practical for batch processing.

It's not just that. Print to PDF for basic stuff it can be an option. For complex documents, print workflows, etc, it's a non-starter.

link

justbaker 2820 days ago

> "How is this different/better than taking <random web browser> and hitting "print"

If you have to repeat this process 2000 times, it becomes time consuming. It doesn't scale for a single user needing 2000 pdfs to do the process manually.

link

posterboy 2820 days ago

You don't know Headless Browser modes. There are a bunch of scriptable CLIs.

link

justbaker 2814 days ago

I do know about headless browsers. The comment above mine mentioned a manual process, no scripts or headless browsers. a combo of Curl, wkhtmltopdf(or other html to pdf) and a for loop can perform this in a bash 1-liner.

link

fermigier 2820 days ago

Prince is cool, I've used it 10 years ago or something. No fuss about that.

It's a bit pricey, though (at leats, pricer than "free"). So we're using WeasyPrint on more recent projects.

WeasyPrint is open source and written in Python. It's much slower than Prince, though, but this can be mitigated by caching renderings. I'm would bet that it's as standard-compliant or bug-free than Prince, but it's good enough for us.

When / if our customers ask for more speed or pixel-perfect support (with the $$$ to match), we will definitively try Prince again.

link

marmaduke 2820 days ago

The Java class caught my eye. Is that a wrapper around a native lib or you make RPC calls to something?

HTML to PDF is something I never thought about since Firefox does it (and results usually aren’t great).

link

mikeday 2820 days ago

It's a wrapper around the native process just to simplify passing command-line arguments. (There is also a persistent process mode for speeding up batch processing of many small documents).

The browsers don't specialise in PDF generation, and we do :)

link