| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mahjongmen 473 days ago
	Which benchmark are you referring to? Voyage-3-large is a text-only and much larger model than Embed-v4. If you want to unlock multimodality with Voyage-3-large, you'd have to either OCR (really bad results usually) or use a VLM to parse your data into textual descriptions (this works alright, but the cost of using a VLM will jack-up your data-pre-processing costs).

2 comments

serjester 473 days ago

I think anyone that cares enough about embedding performance to use niche models is probably parsing their PDF's into some sort of textual format. Otherwise you need orient your all your pipelines to handle images which adds significant complexity (hybrid search, reranking, LLM calls, etc - all way harder with images).

Not to mention an image is optimistically 50 KB vs the same page represented as markdown is maybe 2–5 KB. When you're talking about pulling in potentially hundreds of pages, that's a 10–20x increase in storage, memory usage, and network overhead.

I do wish they had a more head-to-head comparison with voyage. I think they're the de facto king of proprietary embeddings and with Mongo having bought them, I'd love to migrate away once someone can match their performance.

link

mahjongmen 473 days ago

Hey Serjester Email me at elliott@cohere.ai, let's arrange time to chat. We did head to head evals with Voyage Large / Voyage Multimodal and I can share them with you if you are serious about moving your embeddings over. We tested configurations of top open-source, closed-source, multi-vector and single-dense embedding models but I can only choose so many to put on a graph and I'm not in the business of giving Voyage free advertising haha. I agree with you that there is some complexity on multi-modal reranking w.r.t to inference time speeds as well as data transfer / network latency costs. Happy to talk more :)

link

moojacob 473 days ago

I messed up, I apologize.

I looked at the NDCG and thought that was the dataset.since voyage and cohere both used NDCG. I now realize it was separate benchmarks with the same evaluation metric.

link