Hacker News new | ask | show | jobs
by famouswaffles 980 days ago
Oh wow. This seems to be the best released vlm model. The chart/UI understanding displayed in particular is superb.
1 comments

>This is by far the best open source vlm model

LLaVA 1.5 is very good, at least at describing images. http://llava.hliu.cc/

Right, but no separate image encoder + half the size could be very helpful for many applications.
The 7B LLaVa model is smaller, even considering the image encoder (CLIP-L).