|
I've developed a Python API service that uses GPT-4o for OCR on PDFs. It features parallel processing and batch handling for improved performance. Not only does it convert PDF to markdown, but it also describes the images within the PDF using captions like `[Image: This picture shows 4 people waving]`. In testing with NASA's Apollo 17 flight documents, it successfully converted complex, multi-oriented pages into well-structured Markdown. The project is open-source and available on GitHub. Feedback is welcome. |
As others mentioned, consistency is key in parsing documents and consistency is not a feature of LLMs.
The output might look plausible, but without proper validation this is just a nice local playground that can’t make it to production.