|
|
|
|
|
by xtracto
95 days ago
|
|
Edit_distance uses pure levenstein which is quadratic, so for tables of 500k rows and 20+ columns each it will slowdown to a crawl. Without going into a lot of detail, I needed this to work for datasets of that size. So a lot of "trick" optimization and pre-processing has to be done. Otherwise simple merges in pandas or sql/duckdb would had sufficed. |
|