|
|
|
|
|
by mahjongmen
428 days ago
|
|
Which benchmark are you referring to? Voyage-3-large is a text-only and much larger model than Embed-v4. If you want to unlock multimodality with Voyage-3-large, you'd have to either OCR (really bad results usually) or use a VLM to parse your data into textual descriptions (this works alright, but the cost of using a VLM will jack-up your data-pre-processing costs). |
|
Not to mention an image is optimistically 50 KB vs the same page represented as markdown is maybe 2–5 KB. When you're talking about pulling in potentially hundreds of pages, that's a 10–20x increase in storage, memory usage, and network overhead.
I do wish they had a more head-to-head comparison with voyage. I think they're the de facto king of proprietary embeddings and with Mongo having bought them, I'd love to migrate away once someone can match their performance.