|
|
|
|
|
by jbarrow
491 days ago
|
|
> Unfortunately Gemini really seems to struggle on this, and no matter how we tried prompting it, it would generate wildly inaccurate bounding boxes Qwen2.5 VL was trained on a special HTML format for doing OCR with bounding boxes. [1] The resulting boxes aren't quite as accurate as something like Textract/Surya, but I've found they're much more accurate than Gemini or any other LLM. [1] https://qwenlm.github.io/blog/qwen2.5-vl/ |
|