Hacker News new | ask | show | jobs
by saradhi 2275 days ago
Reliably* - coming with limitations

  1. Text pdfs only 

  2. Template is needed
From what I've spoken with 50-60 paid customers for a service I maintain, the most common concerns I listen are

  1. No, We get scan pdfs as well 

  2. No, We deal with lot of customers, cannot template all
1 comments

Yes, agreed that this is indeed a difficult tradeoff.

Just curious, does extracttable.com provide offline mode? It seems that quite a few people can't submit their documents to 3rd party server

We are far away from offline mode. May be 2-3 years later thing, if we survive till that time.

Btw, did you check OpenCV morph line detection? It will help to auto detect the evident cell/row boundaries - thus reducing human effort

I did a bit, but there are tables that simply don't have any lines. Given that human beings are just so much better at detecting tables than AI, I figured maybe the solution is to help people quickly draw the table they see, rather than tweaking algorithm -- hence this new product.