Hacker News new | ask | show | jobs
by mckennameyer 153 days ago
Interesting approach with the cascade. How do you decide when to escalate from fuzzy matching to LLM?
1 comments

So fuzzy matching only makes sense if you expect two columns having the same data more or less, otherwise you can skip that step.

And then you have to pick a threshold -> if similarity of strings is above that threshold, it's a match, otherwise, not. Threshold should be high to prevent false positives. LLM will take care of the non-matches