| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by casvc 3157 days ago
	Thanks for asking. The main difference is focus on depth instead of breadth - thus instead of multitude of possible output formats support only few (PDF/HTML/TXT/IMG), but with some added features. Just few examples: - bulk search and autoredactions (marking / blacking out parts of documents that match certain queries) - signature and handwriting detection - tokenization (for TXT output) - language detection (for TXT/PDF output) - named entity detection (for TXT/PDF output) Potential customers are people developing systems for GDPR (data protection), fraud detection, eDiscovery and content management.

1 comments

PaulHoule 3155 days ago

If you are doing some kind of intense annotation probably your most important thing is having an output format that supports the annotation you want to do -- not necessarily supporting any.

I have been thinking about universal annotation and the formats that I find the most interesting are PDF (because so much content exists in PDF) and HTML (open, easy to work with.)

link

casvc 3154 days ago

You are absolutely right - we are thinking along the same lines. The only reason why we are offering TXT/IMG as output formats next to PDF/HTML is the fact that some people will have their own composite document formats and they can build those out of TXT/IMG.

link