Y
Hacker News
new
|
ask
|
show
|
jobs
by
jabron
204 days ago
What do you mean "bounding boxes"? They were talking about captions and embeddings, so a vision language model is required.
1 comments
Glemkloksdjf
203 days ago
I suggested YOLO and non llm-vl as a lot faster alternative.
Of course CLIP would be otherwise the other option than a big llm-vl one.
link
Of course CLIP would be otherwise the other option than a big llm-vl one.