The Fuyu pre-trained model is not open source. At best, it is source-available. It's also not the only multimodal model you can run locally.
A few other examples include LLaVA[0], IDEFICS[1][2], and CogVLM[3]. Mini-GPT[4] might be another one to look at. I'm pretty sure all of these have better licenses than Fuyu. Fuyu's architecture does sound really interesting, but the license on the pre-trained model is a complete non-starter for almost anything.
A few other examples include LLaVA[0], IDEFICS[1][2], and CogVLM[3]. Mini-GPT[4] might be another one to look at. I'm pretty sure all of these have better licenses than Fuyu. Fuyu's architecture does sound really interesting, but the license on the pre-trained model is a complete non-starter for almost anything.
[0]: https://github.com/haotian-liu/LLaVA
[1]: https://huggingface.co/blog/idefics
[2]: https://huggingface.co/HuggingFaceM4/idefics-80b-instruct
[3]: https://github.com/THUDM/CogVLM
[4]: https://github.com/Vision-CAIR/MiniGPT-4