Hacker News new | ask | show | jobs
by 3abiton 980 days ago
This is the first multimodal model i hear about that is open source. Are there already other alternatives?
1 comments

The Fuyu pre-trained model is not open source. At best, it is source-available. It's also not the only multimodal model you can run locally.

A few other examples include LLaVA[0], IDEFICS[1][2], and CogVLM[3]. Mini-GPT[4] might be another one to look at. I'm pretty sure all of these have better licenses than Fuyu. Fuyu's architecture does sound really interesting, but the license on the pre-trained model is a complete non-starter for almost anything.

[0]: https://github.com/haotian-liu/LLaVA

[1]: https://huggingface.co/blog/idefics

[2]: https://huggingface.co/HuggingFaceM4/idefics-80b-instruct

[3]: https://github.com/THUDM/CogVLM

[4]: https://github.com/Vision-CAIR/MiniGPT-4