|
|
|
|
|
by throwaway4496
324 days ago
|
|
So you parse PDFs, but also OCR images, to somehow get better results? Do you know you could just use the parsing engine that renders the PDF to get the output? I mean, why raster it, OCR it, and then use AI? Sounds creating a problem to use AI to solve it. |
|
Another thing is that most document parsing tasks are going to run into a significant volume of PDFs which are actually just a bunch of scans/images of paper, so you need to build this capability anyways.
TL;DR: PDFs are basically steganography