| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wtcactus 462 days ago
	The claim of “strongest” (what does that even mean?) seems moot. I don’t think a multimodal model is the way to go to use on single, home, GPUs. I would much rather have specific tailored models to use in different scenarios, that could be loaded into the GPU when needed. It’s a waste of parameters to have half of the VRAM loaded with parts of the model targeting image generation when all I want to do is write code.

2 comments

That's interesting. Are they often an amalgam of image & text tokens? Because, yeah, image generation is not interesting to em at all.

Perhaps the model performs better (has higher intelligence) if it was trained on a more diverse set of topics (?)