Hacker News new | ask | show | jobs
by omneity 399 days ago
Have you tried open weights vision models such as Qwen VL, MiniCPM, PaliGemma...?

I'm also curious how usable are simpler vision models such as Florence in case you explored this direction.

2 comments

I actually haven't but nova from Amazon was surprisingly good at things like bounding boxes compared to some others You kind of have to test and measure so many different aspects to get the best at specific tasks Thanks for the idea
we're currently in the process of doing this. i think something that could potentially work is to iterate upon the initial image composition / structure using cheaper models, and then upscale at the end. this way you're saving on that iteration cost, but eventually land on a higher-scale image.