Hacker News new | ask | show | jobs
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans (github.com)
4 points by vov_or 1135 days ago
1 comments

Guys trained a multi-modal chatbot with visual and language instructions based on the open-source multi-modal model OpenFlamingo!

Paper link: https://arxiv.org/abs/2305.04790