|
|
|
|
|
by dmitrykan
1715 days ago
|
|
Yes, for larger datasets of vectors, having the search do a linear scan will likely be slow. So you could take a look at KNN/ANN (K nearest / Approximate nearest neighbors), like https://faiss.ai/ But if you prefer to offload this complexity to a database, you can pick and evaluate one from my blog post. You have 4 out of 6 open source DBs, and 2 commercial ones give you the managed service. All 6 can scale to quite large numbers of vectors. My goal is to systematically study each DB through the lens of a specific search task, which will not be (only) text based. If you think we could collaborate in some way on your dataset, that would be fantastic and probably a learning experience to both sides. |
|