|
|
|
|
|
by naillo
970 days ago
|
|
This is so overcomplicated. You don't need an actual tree that's just a mental model. All you need to do is generate N random hyperplanes (defined via it's normal i.e. generate N normalized vectors in the dimension of your choosing). Then you generate a hash via these by going normal by normal and checking if your query vector `q` is on the left or right side of the normal by checking the sign of the dot product. I.e. `hash = [sign(dot(q, n1)), sign(dot(q, n2)), ...]`. So given a query you find that hash and you get all the vectors with the same hash (using your favorite hash to array data structure, i.e. a dict) and you can further refine the search within that bucket. It's like 10 rows of code. |
|
(but HNSW is generally much better than this tree paritioning scheme)