| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by janalsncm 205 days ago

> Chinese models typically focus on text

Not true at all. Qwen has a VLM (qwen2 vl instruct) which is the backbone of Bytedance’s TARS computer use model. Both Alibaba (Qwen) and Bytedance are Chinese.

Also DeepSeek got a ton of attention with their OCR paper a month ago which was an explicit example of using images rather than text.