| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Workaccount2 207 days ago

I do a test on multimodal LLMs where I show them a dog with 5 legs, and ask them to count how many legs the dog has. So far none of them can do it. They all say "4 legs".

Segment anything however was able to segment all 5 dog legs when prompted to. Which means that meta is doing something else under the hood here, and may lend itself to a very powerful future LLM.

Right now some of the biggest complaints people have with LLMs stems from their incompetence processing visual data. Maybe meta is onto something here.

3 comments

jampekka 207 days ago

Segmentation doesn't need to count legs. I'd guess something like YOLO could segment 5 legged dogs too.

link

chompychop 207 days ago

YOLO is not a segmentation model.

link

jampekka 207 days ago

https://docs.ultralytics.com/tasks/segment/

link

chompychop 207 days ago

Thanks! TIL there's a class of segmentation models with the YOLO naming scheme.