Very cool :) I initially tried something like this, but had trouble getting reliable results without tuning my distance functions to the specific schema & domain. Did you find a way around that?
No, I tuned a model on my (unique) table data, which does not take long, since the model is small.
My model seemed in my tests at least to hold up good enough, since its only used as a preselect to find "good enough" candidates to use Levenshtein later on.
But yes, a universal model (maybe a fine-tuned transformer / embedding model) might be better, but i did not have the time (and knowledge) to build one yet.
My model seemed in my tests at least to hold up good enough, since its only used as a preselect to find "good enough" candidates to use Levenshtein later on.
But yes, a universal model (maybe a fine-tuned transformer / embedding model) might be better, but i did not have the time (and knowledge) to build one yet.