| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by coder543 901 days ago
	Being multimodal doesn’t seem to require much of a size penalty: https://github.com/dlyuangod/TinyGPT-V Even so, Google treats the Gemini Pro Vision model as a separate model from Gemini Pro, so it could have separate parameters that are dedicated to vision (like CogVLM does), and that wouldn’t impact the size of the model as far as text-tasks are concerned.