|
|
|
|
|
by rldjbpin
522 days ago
|
|
reads to me like 95% of the "conventional AI" was applied to the problem and then using llm in the end seems to work like a lucky three-faced dice. when "embeddings" are used to perform closeness test, you are using a pretrained computer vision model behind the scenes. it is doing the far majority of tasks of filtering out hundreds of images down to a handful. visual llm works on textual descriptions that seem far too close for similar images. regardless, more power to the team for finding something that works for them. |
|
SOTA V-LLMs do not work on textual descriptions.