Hacker News new | ask | show | jobs
by Twixes 1705 days ago
AFAIK there isn't any great way. pg_similarity etc. offer useful functions, but that doesn't help with big-O here. And the indexing capabilities that exist are only useful for geometry/geography, not abstract metric spaces (the mathematical sense of "metric space", examples of metrics being Hamming distance or Levenshtein distance). I haven't found a DBMS optimized for metric spaces at all actually. The second best solution I've got is just using a columnar DBMS like ClickHouse, which still needs to scan all values, but at least that reads values from a blob which _only stores that specific column_ – hugely faster than parsing whole rows. The lack of the ideal solution is why I'm building Emdrive, an RDBMS with first class support for similarity search, based on indexing with an M-tree variation. Still very early stages ;) https://github.com/Twixes/emdrive
1 comments

This looks like a really interesting project! I'm going to check it out.