Hacker News new | ask | show | jobs
by wtcactus 462 days ago
The claim of “strongest” (what does that even mean?) seems moot. I don’t think a multimodal model is the way to go to use on single, home, GPUs.

I would much rather have specific tailored models to use in different scenarios, that could be loaded into the GPU when needed. It’s a waste of parameters to have half of the VRAM loaded with parts of the model targeting image generation when all I want to do is write code.

2 comments

That's interesting. Are they often an amalgam of image & text tokens? Because, yeah, image generation is not interesting to em at all.
Perhaps the model performs better (has higher intelligence) if it was trained on a more diverse set of topics (?)