| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by omneity 399 days ago
	Have you tried open weights vision models such as Qwen VL, MiniCPM, PaliGemma...? I'm also curious how usable are simpler vision models such as Florence in case you explored this direction.

2 comments

shmoogy 397 days ago

I actually haven't but nova from Amazon was surprisingly good at things like bounding boxes compared to some others You kind of have to test and measure so many different aspects to get the best at specific tasks Thanks for the idea

link

palashshah 399 days ago

we're currently in the process of doing this. i think something that could potentially work is to iterate upon the initial image composition / structure using cheaper models, and then upscale at the end. this way you're saving on that iteration cost, but eventually land on a higher-scale image.

link