|
|
|
|
|
by simonw
314 days ago
|
|
I convert the PDF into an image per page, then dump those images into either an OCR program (if the PDF is a single column) or a vision-LLM (for double columns or more complex layouts). Some vision LLMs can accept PDF inputs directly too, but you need to check that they're going to convert to images and process those rather than attempting and failing to extract the text some other way. I think OpenAI, Anthropic and Gemini all do the images-version of this now, thankfully. |
|
I expect the current LLMs significantly improve upon the previous ways of doing this, e.g. Tesseract, when given an image input? Is there any test you're aware of for model capabilities when it comes to ingesting PDFs?