| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jabron 249 days ago
	What do you mean "bounding boxes"? They were talking about captions and embeddings, so a vision language model is required.

1 comments

I suggested YOLO and non llm-vl as a lot faster alternative.

Of course CLIP would be otherwise the other option than a big llm-vl one.