Hacker News new | ask | show | jobs
by reissbaker 642 days ago
Couple notes for newcomers:

1. This is a VLM, not a text-to-image model. You can give it images, and it can understand them. It doesn't generate images back.

2. It seems like Pixtral 12B benchmarks significantly below Qwen2-VL-7B [1], so if you want the best local model for understanding images, probably use Qwen2. If you want a large open-source model, Qwen2-VL-72B is most likely the best option.

1: https://qwenlm.github.io/blog/qwen2-vl/

1 comments

>If you want a large open-source model, Qwen2-VL-72B is most likely the best option.

Only the 2&7B have been "open sourced". From your link:

>We opensource Qwen2-VL-2B and Qwen2-VL-7B with Apache 2.0 license, and we release the API of Qwen2-VL-72B!