Hacker News new | ask | show | jobs
by xtracto 95 days ago
Edit_distance uses pure levenstein which is quadratic, so for tables of 500k rows and 20+ columns each it will slowdown to a crawl. Without going into a lot of detail, I needed this to work for datasets of that size. So a lot of "trick" optimization and pre-processing has to be done.

Otherwise simple merges in pandas or sql/duckdb would had sufficed.