Hacker News new | ask | show | jobs
by remolacha 582 days ago
not quite what you're describing, but I open-sourced a fuzzy deduplication tool last week: https://dedupe.it Would be interested in expanding it to deal with data cleaning more broadly
1 comments

Not sure if you have introduced an artificial delay, but deduping ~25 rows shouldn't take 5+ seconds...

edit: I see you're using an LLM, but " ~$8.40 per 1k records" sounds unsustainable.