|
|
|
|
|
by bambax
542 days ago
|
|
I just tested Gemini 1.5 Flash (interactively on Google AI Studio) and the results are far from acceptable. OCR seems good, on par with Google Vision. But the footnotes are not properly identified on most pages; they are properly identified when there is a large gap and the first line of the footnotes starts with a number; but when the footnotes block starts with text (continuing a footnote from a previous page) and/or the gap is small or almost non-existent, it fails (all text on the page is considered belonging to main text). But the main problem isn't even that, it's that it takes between 10 to 20 seconds per page. That would mean over three hours per volume of 600 pages. Google Vision takes less than one second per page. It's possible there is a setup cost and that doing batches or even full PDFs would be better, though. Do you have experience with this? And can you maybe share "prompt secrets" that would improve the results...? |
|