|
|
|
|
|
by gravypod
2471 days ago
|
|
There are no good exact indexing structures but there are a lot of very high performance approximate NN structures. Facebook has an open source implementation of some of these in a project called faiss [0] which does a relatively good job of this. [0] - https://github.com/facebookresearch/faiss |
|
We've frequently had the same dream of adding more native support for nearest-neighbor type queries, since that is the workhorse of so many useful techniques in the modern NLP stack.
Right now, we have lots of dense vectors stored in massive toast tables in PG. It's faster to fetch them rather than recompute them, especially since there are a number of preprocessing steps that limit what we pay attention to.
The discussion here about full text search versus semantic search is interesting. In our experience, both are highly relevant. Sometimes it's most useful for our customers to segment their conversation data by exact text matches, and other times semantic clustering is most effective. I think there's plenty of reason to offer both kinds of capabilities.