Same with opensearch and elasticsearch, both of which have added vector search as well (slight differences between their implementations). And since vector search is computationally expensive, there is a lot of value in narrowing down your result set with a regular query before calculating the best matches from the narrowed down result set.
From what I've seen, the big limitation currently is dimensionality. Most of the more advanced models have a high dimensionality and especially Elasticsearch and Lucene limit the dimensionality to 1024. E.g. several of the openai models have a much higher dimensionality. Opensearch works around this by supporting alternate implentations to lucene for vectors.
Of course it's a sane limitation from a cost and computation point of view, having these huge embeddings doesn't scale that well. But it does limit the quality of the results unless you can train your own models and tailor them to your use case.
If you are curious on how to use this stuff, I invested some time a few weeks ago getting my kt-search kotlin library to support this and wrote some documentation for this: https://jillesvangurp.github.io/kt-search/manual/KnnSearch.h.... The quality was underwhelming IMHO but that might be my complete lack of experience with this stuff.
I have no experience with pinecone and I'm sure it's great. But I do share the sentiment that they might not come out on top for this. There are too many players here and it's a fast moving field. OpenAI just majorly moved the whole field forward enormously in terms of what is possible and feasible.
I wasn't making a personal recommendation to you? I was answering more broadly why someone would use Pinecone in the future.
Every new software company like this has "why wouldn't everyone just use x existing open source project, why even try to make it a real business with a hundred devs, actual support/marketing, and big ambitions to be more than a plugin to Postgres?"
Based on the videos and interviews with their lead dev I've seen Pinecone has some quite large plans by integrating with a wider stack and integrating with company databases, well beyond what they have done so far releasing an early version of the DB.
Regardless, getting wider adoption via actual businesses investing in marketing/sales to seed ideas in the market can spur development and potentially progress/innovate the tooling across the wider market, that feeds back into open source.
From what I've seen, the big limitation currently is dimensionality. Most of the more advanced models have a high dimensionality and especially Elasticsearch and Lucene limit the dimensionality to 1024. E.g. several of the openai models have a much higher dimensionality. Opensearch works around this by supporting alternate implentations to lucene for vectors.
Of course it's a sane limitation from a cost and computation point of view, having these huge embeddings doesn't scale that well. But it does limit the quality of the results unless you can train your own models and tailor them to your use case.
If you are curious on how to use this stuff, I invested some time a few weeks ago getting my kt-search kotlin library to support this and wrote some documentation for this: https://jillesvangurp.github.io/kt-search/manual/KnnSearch.h.... The quality was underwhelming IMHO but that might be my complete lack of experience with this stuff.
I have no experience with pinecone and I'm sure it's great. But I do share the sentiment that they might not come out on top for this. There are too many players here and it's a fast moving field. OpenAI just majorly moved the whole field forward enormously in terms of what is possible and feasible.