Hacker News new | ask | show | jobs
by rldjbpin 522 days ago
reads to me like 95% of the "conventional AI" was applied to the problem and then using llm in the end seems to work like a lucky three-faced dice.

when "embeddings" are used to perform closeness test, you are using a pretrained computer vision model behind the scenes. it is doing the far majority of tasks of filtering out hundreds of images down to a handful.

visual llm works on textual descriptions that seem far too close for similar images. regardless, more power to the team for finding something that works for them.

1 comments

>visual llm works on textual descriptions

SOTA V-LLMs do not work on textual descriptions.