|
|
|
|
|
by fzysingularity
213 days ago
|
|
We ran a small visual benchmark [1] of GPT, Gemini, Claude, and our new visual agent Orion [2] on a handful of visual tasks: object detection, segmentation, OCR, image/video generation, and multi-step visual reasoning. The surprising part: models that ace benchmarks often fail on seemingly trivial visual tasks, while others succeed in unexpected places. We show concrete examples, side-by-side outputs, and how each model breaks when chaining multiple visual steps. We go into more details in our technical whitepaper [3]. Play around with Orion for free here [4]. [1] Showdown: https://chat.vlm.run/showdown [2] Learn about Orion: https://vlm.run/orion [3] Technical whitepaper: https://vlm.run/orion/whitepaper [4] Chat with Orion: https://chat.vlm.run/ Happy to answer questions or dig into specific cases in the comments. |
|