Hacker News new | ask | show | jobs
by eternauta3k 1157 days ago
Interesting, so the LLM is "just" getting your question plus a normal text description of the image (as vectors)?
1 comments

At a high level yes.

More precisely - It gets the question After irs passed through a matrix that transforms the text description of the image so the LLM can “understand” it.

It maps from the space of one ML model to the other.