| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ashvardanian 268 days ago
	Qwen models have historically been pretty good, but there seems to be no architectural novelty here, if I’m not missing it. Seems like another vision encoder, with a projection, and a large autoregressive model. Have there been any better ideas in the VLM space recently? I’ve been away for a couple of years :(