Hacker News new | ask | show | jobs
by Workaccount2 207 days ago
I do a test on multimodal LLMs where I show them a dog with 5 legs, and ask them to count how many legs the dog has. So far none of them can do it. They all say "4 legs".

Segment anything however was able to segment all 5 dog legs when prompted to. Which means that meta is doing something else under the hood here, and may lend itself to a very powerful future LLM.

Right now some of the biggest complaints people have with LLMs stems from their incompetence processing visual data. Maybe meta is onto something here.

3 comments

Segmentation doesn't need to count legs. I'd guess something like YOLO could segment 5 legged dogs too.
YOLO is not a segmentation model.
Thanks! TIL there's a class of segmentation models with the YOLO naming scheme.
I thought it was a joke about YAML
Lol you obviously haven't seen what cheats for FPS games look like in the last 3 years.

https://github.com/Babyhamsta/Aimmy

You don’t need segmentation to count legs. Object detection can do that. DeepLabCut from 2020 perhaps.
I doubt that gemini 3 cannot do it.