Hacker News new | ask | show | jobs
by ArnavAgrawal03 328 days ago
Multimodal RAG is exactly what we argue for. In their original state, though, multivectors (that form the basis for multi-modal RAG) are very unwieldy - computing the similarity scores is very expensive and so scaling them up in this state is hard.

You need to apply things like quantization, single-vector conversions (using fixed dimensional encodings), and better indexing to ensure that multimodal RAG works at scale.

That is exactly what we're doing at Morphik :)

1 comments

And the Gemini(s) aren't already doing this at GoogleCorp?