|
|
|
|
|
by ekidd
480 days ago
|
|
I've been experimenting with vlm-run (plus custom form definitions), and it works surprisingly well with Gemini 2.0 Flash. Costs, as I understand, are also quite low for Gemini. You'll have best results with simple to medium-complexity forms, roughly the same ones you could ask a human to process with less than 10 minutes of training. If you need something like this, it's definitely good enough that you should consider kicking the tires. |
|
It gives you an idea of where today's models fail (Gemini Flash, OpenAI gpt4o+mini, open-source ones like Llama 3.2 Vision, Qwen VL 2.5 etc).