Hacker News new | ask | show | jobs
by MrSaints 3216 days ago
Core developer of `athenapdf` here :)

I had a quick look at `pdf-bot`, and though we both rely on the same underlying technology (we are only just moving to headless Chromium; we were on Electron before), I believe we have slightly different ambitions with our respective project. But, I may be biased.

For example, `pdf-bot` seems to be tied exclusively to a specific converter, and storage backend. With `athenapdf` however, we are moving more, and more towards building a toolkit or rather, framework for other people to construct their own conversion processes (or even microservice)[0].

Consequently, we are working towards general abstractions like fetching, converting, and uploading, that can have different implementations (e.g. wkhtmltopdf, LibreOffice, Weasyprint, etc).

With our microservice assembly as well, we are focused heavily on ensuring we have:

1. Instrumentation, and metrics (which `pdf-bot` appears to currently lack)

2. Support for different retry mechanisms (e.g. retry using the same converter or retry using a different converter)

3. Support for multiple input MIME types

4. Synchronous API calls (`pdf-bot` appears to be mostly asynchronous, with batch processing, and callbacks)

5. Ease of installation (e.g. Docker), and configuration

We also have a CLI assembly[1] that can support custom JavaScript plugins[2] (e.g. Markdown -> PDF, Readability, etc). So you don't need to run a service or make API calls for conversions.

[0] https://github.com/arachnys/athenapdf/tree/v3/pkg

[1] https://github.com/arachnys/athenapdf/blob/v3/cmd/cli/main.g...

[2] https://github.com/arachnys/athenapdf/tree/v3/pkg/runner/plu...

2 comments

Thank you for athenapdf and for rescuing me from the pains of wkhtmltopdf - I am a happy user. :)

My only small problem with it was the somewhat complex setup for using athenapdf-service with a new project (especially since I use docker-machine) but I have now mostly automated the whole thing.

Just out of interest - do you consider asynchronous an advantage (being a Node developer I generally love async very much)? Not that it matters to me - my needs are trivial for the service to handle.

Edit: actually I can see how it async would make my life much more complicated for my simple use case - I would have to write something to track requests and responses rather than just looping through a bunch of URL's that need converting.

That's interesting feedback! Thank you :)

We actually went with Docker for the set up because it simplified dependency management tremendously, and it allowed us to deploy on platforms like Kubernetes, Swarm, and ECS. As a plus, it gave us some confidence that if it works for us, it should work for others (obviously, we have come across cases where Docker behaves differently across platforms).

I consider asynchronous processing (in this context) as advantageous in some cases. Indeed, when we were refactoring `athenapdf`, we considered introducing a message queue for workers to pull work from, and to put back when the work completes. The problem with this however, is that we can't as easily scale horizontally (i.e. introduce node replicas behind a load balancer), as if we tried to get / update a job, we may not get the same node we originally got. I mean, the solution can be as easy as introducing a centralised message queue of sorts (or even a sticky session), but that complicates the set up process, so we decided against it.

Taken together, for our specific use cases, we believe it is a lot simpler to consume a synchronous API. No webhooks / callbacks. No polling. No concerns over acknowledgement. If a HTTP call fails, we will know about it immediately. If a complex retry mechanism is needed, we think this should be accomplished in the client application.

In the long term, I believe we should have a toolkit that can easily be plugged into a wider orchestration engine like Conductor (https://netflix.github.io/conductor/). That way, anyone can develop their own conversion process pipeline with ease.

hi and thanks for athenapdf

I was wondering, does it support custom page headers (or generally running elements)

We have an issue already filed for that, and unfortunately no. That's something CSS Paged Media is supposed to solve.