Hacker News new | ask | show | jobs
by jay888 3146 days ago
How is your service going to be different from existing file conversion services like zamzar.com, convertfiles.com etc ?
1 comments

Thanks for asking. The main difference is focus on depth instead of breadth - thus instead of multitude of possible output formats support only few (PDF/HTML/TXT/IMG), but with some added features. Just few examples: - bulk search and autoredactions (marking / blacking out parts of documents that match certain queries) - signature and handwriting detection - tokenization (for TXT output) - language detection (for TXT/PDF output) - named entity detection (for TXT/PDF output)

Potential customers are people developing systems for GDPR (data protection), fraud detection, eDiscovery and content management.

If you are doing some kind of intense annotation probably your most important thing is having an output format that supports the annotation you want to do -- not necessarily supporting any.

I have been thinking about universal annotation and the formats that I find the most interesting are PDF (because so much content exists in PDF) and HTML (open, easy to work with.)

You are absolutely right - we are thinking along the same lines. The only reason why we are offering TXT/IMG as output formats next to PDF/HTML is the fact that some people will have their own composite document formats and they can build those out of TXT/IMG.