Hacker News new | ask | show | jobs
by refulgentis 641 days ago
GPT 4o doesn't do actual OCR and there's much smaller and more effective models for specifically this problem.

I appreciate your work, intent, and sharing it. It's very important to appreciate what you're doing and its context when sharing it.

At that point, you are responsible for it, and the choices you make when communicating about it reflect on you.

1 comments

I've found this method really useful for prepping PDFs before running them through AI. I mix it with traditional OCR for a hybrid approach. It's a game-changer for getting info from tricky pages. Sure, you wouldn't bet the farm on it for a big, official project, but it's still pretty solid. If you're willing to spend a bit more, you can use extra prompts to check for any context skips. It's a lot of work, though - probably best left to companies that specialize in this stuff.

I've been testing it out on pitch decks made in Figma and saved as JPGs. Surprisingly, the LLM OCR outperformed top dogs like SolidDocuments and PDFtron. Since I'm mainly after getting good context for the LLM from PDFs, I've been using this hybrid setup, bringing in the LLM OCR for pages that need it. In my book, this API is perfect for these kinds of situations.