|
|
|
|
|
by janalsncm
205 days ago
|
|
> Chinese models typically focus on text Not true at all. Qwen has a VLM (qwen2 vl instruct) which is the backbone of Bytedance’s TARS computer use model. Both Alibaba (Qwen) and Bytedance are Chinese. Also DeepSeek got a ton of attention with their OCR paper a month ago which was an explicit example of using images rather than text. |
|