Hacker News new | ask | show | jobs
by smpanaro 900 days ago
MobileVLM [1] is another recent small multimodal model. They trained their own 1.4B/2.7B LLaMa from scratch using RedPajama and Vicuna instead of leveraging Phi-2.

The papers only have one common benchmark (GQA, MobileVLM scores better) so hard to say how they compare otherwise.

[1] https://arxiv.org/abs/2312.16886