Hacker News new | ask | show | jobs
by elpalek 394 days ago
Recently tested a (non-english) pdf ocr with Gemini 2.5 Pro. First, directly ask it to extract text from pdf. Result: random text blob, not useable.

Second, I converted pdf into pages of jpg. Gemini performed exceptional. Near perfect text extraction with intact format in markdown.

Maybe there's internal difference when processing pdf vs jpg inside the model.

1 comments

Model isn’t rendering the PDF probably, just looking in the file for text.