| HN Mirror

That's interesting feedback! Thank you :)

We actually went with Docker for the set up because it simplified dependency management tremendously, and it allowed us to deploy on platforms like Kubernetes, Swarm, and ECS. As a plus, it gave us some confidence that if it works for us, it should work for others (obviously, we have come across cases where Docker behaves differently across platforms).

I consider asynchronous processing (in this context) as advantageous in some cases. Indeed, when we were refactoring `athenapdf`, we considered introducing a message queue for workers to pull work from, and to put back when the work completes. The problem with this however, is that we can't as easily scale horizontally (i.e. introduce node replicas behind a load balancer), as if we tried to get / update a job, we may not get the same node we originally got. I mean, the solution can be as easy as introducing a centralised message queue of sorts (or even a sticky session), but that complicates the set up process, so we decided against it.

Taken together, for our specific use cases, we believe it is a lot simpler to consume a synchronous API. No webhooks / callbacks. No polling. No concerns over acknowledgement. If a HTTP call fails, we will know about it immediately. If a complex retry mechanism is needed, we think this should be accomplished in the client application.

In the long term, I believe we should have a toolkit that can easily be plugged into a wider orchestration engine like Conductor (https://netflix.github.io/conductor/). That way, anyone can develop their own conversion process pipeline with ease.