|
|
|
|
|
by zffr
335 days ago
|
|
PDFs don’t always contain actual text. Sometimes they just contain instructions to draw the letters. For that reason, IMO rendering a PDF page as an image is a very reasonable way to extract information out of it. For the other formats you mentioned, I agree that it is probably better to parse the document instead. |
|
Yeah, but when they do, it makes a difference.
Also, speaking from experience, most invoices do contain actual text.