| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ekidd 480 days ago
	I've been experimenting with vlm-run (plus custom form definitions), and it works surprisingly well with Gemini 2.0 Flash. Costs, as I understand, are also quite low for Gemini. You'll have best results with simple to medium-complexity forms, roughly the same ones you could ask a human to process with less than 10 minutes of training. If you need something like this, it's definitely good enough that you should consider kicking the tires.

2 comments

It gives you an idea of where today's models fail (Gemini Flash, OpenAI gpt4o+mini, open-source ones like Llama 3.2 Vision, Qwen VL 2.5 etc).

Very cool! If you have more examples / schemas you'd be interested in sharing, feel free to add to the `contrib` section.