Hacker News new | ask | show | jobs
by GaggiX 980 days ago
>This is by far the best open source vlm model

LLaVA 1.5 is very good, at least at describing images. http://llava.hliu.cc/

1 comments

Right, but no separate image encoder + half the size could be very helpful for many applications.
The 7B LLaVa model is smaller, even considering the image encoder (CLIP-L).