Hacker News new | ask | show | jobs
by grayjay 1882 days ago
This technology is of intense interest to organizations that process thousands of forms per day. For example, the Internal Revenue Service, insurance companies, banks, etc. It is very much worth their while to optimize that processing, and they do.

The cloud services reviewed here are typically components in a much larger end-to-end process. They are valuable because they are fast, and work at scale.

Suggesting that the technology is useless because it can't parse a random set of 51 invoices misses the actual use case for which these services are appropriate.

1 comments

I think it’s actually the other way round. The IRS, banks and insurers design and mandate use of their own forms, and if you you deviated one bit from their form they happily deny your request, or they will even just plain ignore you. They just have the power to put the burden on the user to fill in the form in a way that their system can process. That’s not really who this software is aiming at. Sure, the IRS e.a. can and probably will use these advanced extraction services, but only once the technology is readily available and reasonably priced.

Instead, this software is aimed at companies processing all sorts of documents with structured data, but without (very) strict form requirements (or with very low compliance with those requirements). Processing invoices is actually one of the best examples out there: every company has to do it, the basic data structure is nearly universally identical, and yet the form is so different and complex to process with general purpose tools (hence specially designed tools for invoice recognition). These companies may have found great value in processing these forms and may be willing to pay for advanced text extraction tools, because their only alternative is manual processing (aided) by humans.