|
|
|
|
|
by tastyminerals
2332 days ago
|
|
It's a nice overview article for anyone interested in the topic of IE from financial documents. However, for industrial level solutions Tesseract does not cut it. Abbyy is the best OCR engine on the market currently. Receipt IE works just fine with rules supported by a small BiLSTM as fallback just because receipts do not contain a lot of text. With invoices this approach is suboptimal. On a general note no DL approach will give you fast and high enough results just because any advanced network would be too slow and too generic. If extraction takes more than 5 sec. it is hard to sell such system. |
|
Yeah, but see, I have this shiny hammer, and if I squint just right, everything looks like a nail!
The seasonal trends in the programmer world get kinda tiring after a while, and the people peddling the latest hot new thing equally so, such as this thinly veiled promo piece for nanonets.
In science, there's this concept of falsifiability. If a theory can't be disproven, it's automatically false. The same goes for technological evangelism. If no-one knows what a piece of tech is bad at, how the hell would you know if it's any good at anything, really? There are no panaceas, no one-tool-to-rule-them-all, no single piece of tech that will usher in a new golden age for programmerkind. They're just tools in a toolbox. Know what each tool is good at. Know what each tool is bad at. Don't forget your old tools, just because the newest tool is still very shiny.
Platitudes, I know, but still.