Hacker News new | ask | show | jobs
Ask HN: Service to convert between different document formats
2 points by casvc 3158 days ago
Hey HN guys and girls,

We are working on SaaS API to convert between different document formats (e.g. Excel/CAD/Email/whatever to PDF/TXT). I am wondering: which conversions pairs (source format / destination format) are of interest to you and why?

Thank you!

2 comments

How is your service going to be different from existing file conversion services like zamzar.com, convertfiles.com etc ?
Thanks for asking. The main difference is focus on depth instead of breadth - thus instead of multitude of possible output formats support only few (PDF/HTML/TXT/IMG), but with some added features. Just few examples: - bulk search and autoredactions (marking / blacking out parts of documents that match certain queries) - signature and handwriting detection - tokenization (for TXT output) - language detection (for TXT/PDF output) - named entity detection (for TXT/PDF output)

Potential customers are people developing systems for GDPR (data protection), fraud detection, eDiscovery and content management.

If you are doing some kind of intense annotation probably your most important thing is having an output format that supports the annotation you want to do -- not necessarily supporting any.

I have been thinking about universal annotation and the formats that I find the most interesting are PDF (because so much content exists in PDF) and HTML (open, easy to work with.)

You are absolutely right - we are thinking along the same lines. The only reason why we are offering TXT/IMG as output formats next to PDF/HTML is the fact that some people will have their own composite document formats and they can build those out of TXT/IMG.
I am curious who the customers for this service are and what document formats they deal with.