Hacker News new | ask | show | jobs
by PaulKeeble 236 days ago
About the worst job on any enterprise software project is the PDF output, they always end up doing it for emails or something else and its a never ending list of bugs. Text formatting is a never ending list of problems since its so got a lot of vague inputs and a relatively strict output. Far too many little details go wrong.
3 comments

With PDF, my best approach was to go very low level. I've used PDFKit and PDFBox libraries and both provide a way to output vector operations. It allows to implement extremely performant code. The resulting PDF is tiny and looks gorgeous (because it's vector). And you can implement anything. Code will be verbose, but it's worth it.

I even think that it's viable to output PDF without any libraries. I've investigated that format a bit and it doesn't seem too complicated, at least for relatively dumb output.

We've spent twenty years working on HTML to PDF conversion and I expect we could easily spend another twenty years, so feel free to give Prince a try if you would rather avoid the headache :)
Normally when we are nearly there we say its 95% done and only 95% of the work remains. If your feeling is you are half done I suspect more than 50% of the work remains!

What I know having done a lot in this space is we aren't close!

yeah, we often refer to the first 90% of the work and the second 90% of the work lol.
Awesome. From curiosity: is Prince's core still written in Mercury? (Looked at old comments.)
Absolutely! The CSS support, layout engine, PDF output, and JavaScript interpreter are all written in Mercury, while the font support that was originally a mix of Mercury and C has now been rewritten as a standalone Rust project, Allsorts.
Thinking about phantomjs and rasterize brings back nightmares