| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by smpanaro 900 days ago

MobileVLM [1] is another recent small multimodal model. They trained their own 1.4B/2.7B LLaMa from scratch using RedPajama and Vicuna instead of leveraging Phi-2.

The papers only have one common benchmark (GQA, MobileVLM scores better) so hard to say how they compare otherwise.

[1] https://arxiv.org/abs/2312.16886