|
|
|
|
|
by SnowflakeOnIce
388 days ago
|
|
A lot of AI-based PDF processing renders the PDF as images and then works directly with that, rather than extracting text from the PDF programmatically. In such systems, text that was hidden for human view would also be hidden for the machine. Though surely some AI systems do not use PDF image rendering first! |
|
I wonder if the longer pipeline (rasterization + OCR) significantly increase the cost (processing, maintenance…). If so, some company may even remove the process knowingly (and I won’t blame them).