|
|
|
|
|
by jbarrow
491 days ago
|
|
I've been very impressed by Gemini 2.0 Flash for multimodal tasks, including object detection and localization[1], plus document tasks. But the 15 requests per minute limit was a severe limiter while it was experimental. I'm really excited to be able to actually _do_ things with the model. In my experience, I'd reach for Gemini 2.0 Flash over 4o in a lot of multimodal/document use cases. Especially given the differences in price ($0.10/million input and $0.40/million output versus $2.50/million input and $10.00/million output). That being said, Qwen2.5 VL 72B and 7B seem even better at document image tasks and localization. [1] https://notes.penpusher.app/Misc/Google+Gemini+101+-+Object+... |
|
Why not use o1-mini?