Hacker News new | ask | show | jobs
by K0IN 597 days ago
This is very interesting, i was building something similar, but i used https://github.com/K0IN/string-embed (embeddings based on a distance function - Levenshtein in my case) to generate embeddings, for deterministic matching.

I will follow your project, im interested in your ann search speeds :)

1 comments

Very cool :) I initially tried something like this, but had trouble getting reliable results without tuning my distance functions to the specific schema & domain. Did you find a way around that?
No, I tuned a model on my (unique) table data, which does not take long, since the model is small.

My model seemed in my tests at least to hold up good enough, since its only used as a preselect to find "good enough" candidates to use Levenshtein later on.

But yes, a universal model (maybe a fine-tuned transformer / embedding model) might be better, but i did not have the time (and knowledge) to build one yet.