|
|
|
|
|
by ashvardanian
268 days ago
|
|
Qwen models have historically been pretty good, but there seems to be no architectural novelty here, if I’m not missing it. Seems like another vision encoder, with a projection, and a large autoregressive model. Have there been any better ideas in the VLM space recently? I’ve been away for a couple of years :( |
|