| Hey Simon, Elliott here from Cohere. We benchmarked against Nomic's models on our consortium of datasets ranging from text-only, image-only, and mixed modalities. Without publishing additional benchmarks, I am confident in saying that our model is more performant. At Cohere, for our embed models, we have not deprecated any of our embedding models since we started (I know because I've been there that long) and if we were to start doing so, I would take into account the worry of ensuring our users have a way of accessing our models. One aspect here that isn't factored is also efficiency. Yes there might be strong open weight models but if you're punching at the 7bn+ weight class your serving requirements are vastly different from a throughput efficiency perspective (also your query-inference speed). All food for thought. That being said, if for your use-case, Nomic Embed Vision 1.5 is better than Embed-v4.0, happy to hop on a call to discuss where the differential may be. |
This matters for embedding models because I'm presumably building up a database of many millions of vectors for later similarity comparisons - so I need to know I'll be able to embed an arbitrary string in the future in order for that investment to still make sense.
Size doesn't matter much to me, I don't even need to be able to run that model, it's more about having an insurance policy for my own peace of mind.
(Even a covenant that says "in the event that Cohere goes out of business this model will be made available under license X" would address this itch for me.)