Hacker News new | ask | show | jobs
by axiom92 971 days ago
Right, but no separate image encoder + half the size could be very helpful for many applications.
1 comments

The 7B LLaVa model is smaller, even considering the image encoder (CLIP-L).