|
|
|
|
|
by binalpatel
146 days ago
|
|
This is admittedly dated but even back in December 2023 GPT-4 with it's Vision preview was able to very reliably do structured extraction, and I'd imagine Gemini 3 Flash is much better than back then. https://binal.pub/2023/12/structured-ocr-with-gpt-vision/ Back of the napkin math (which I could be messing up completely) but I think you could process a 100 page PDF for ~$0.50 or less using Gemini 3 Flash? >560 input tokens per page * 100 pages = 56000 tokens = $0.028 input ($0.5/m input tokens)
>~1000 output tokens per page * 100 pages = $0.30 output ($3/m output tokens) (https://ai.google.dev/gemini-api/docs/gemini-3#media_resolut...) |
|