Hacker News new | ask | show | jobs
by rocauc 614 days ago
A suggestion: I'd swap llava for Florence-2 for your open set text description. Florence-2 seems uniformly more descriptive in its outputs.
2 comments

They are using Ollama which is based on llama.cpp; florence is not supported on that backend.
I found grounding-dino better than Florence and faster
I found YOLOS to be faster and better, bot real time but 22k objects under half second