Hacker News new | ask | show | jobs
by social_quotient 1045 days ago
Regarding the embedding vectors - is there a maximum limit to their dimensionality? Also, can you share insights into how the precision remains consistent at 99.9% even with high-dimension vectors?
1 comments

For now we didn't put a limit on the dimension of the vectors, so the machine can fit as much as #vector * #dimension * sizeof(float) into memory. For now we just support dense vector, and in the future we will work on sparse vector support for much higher dimension. I think you are referring to the "Curse of dimensionality" problem. Here is my thoughts: in a graph-based index such as SpeedANN, or HNSW, each vector is treated as a node in the graph, and the index is a nearest neighbor graph. Different from spatial partition-based indices, the topology quality of the nearest neighbor graph is independent from the dimensionality of the vectors. Our benchmark is on 960 dimension vector, but we will do more experiments in sparse vectors in the future