|
|
|
|
|
by throwaway4496
315 days ago
|
|
We process invoices from around the world, so more PDF generators than I care to count. It is hard a problem for sure, but the problem is the rendering, you can't escape that by rastering it, that is rendering. So it is absurd to pretend you can solve the rendering problem by rendering it into an image instead of a structured format. By rendering it into a raster, now you have 3 problems, parsing the PDF, rendering quality raster, then OCR'ing the raster. It is mind numbingly absurd. |
|
If your PDF renders a part of the sentence at the beginning of the document, a part in the middle, and a part at the end, split between multiple sections, it's still rather trivial to render.
To parse and understand that this is the same sentence? A completely different matter.