Hacker News new | ask | show | jobs
by Glemkloksdjf 207 days ago
Any current one. they are easy to use and you can just benchmark them yourself.

I'm using small and medum.

Also the code for using it is very short and easy to use. You can also use ChatGPT to generate small exepriments to see what fits your case better

1 comments

There aren’t any YOLO models for captioning and the other models aren’t robust enough to make for good embedding models.
You can get labels out of the classifier and bounding box models.

They are super fast.

Its just an alternative i'm mentioning. I would assume a person knowing a little bit of that domain.

Otherwise the first option would be CLIP i assume. llm-vl is just super slow and compute intensive.