Y
Hacker News
new
|
ask
|
show
|
jobs
by
remolacha
582 days ago
not quite what you're describing, but I open-sourced a fuzzy deduplication tool last week:
https://dedupe.it
Would be interested in expanding it to deal with data cleaning more broadly
1 comments
turtlebits
582 days ago
Not sure if you have introduced an artificial delay, but deduping ~25 rows shouldn't take 5+ seconds...
edit: I see you're using an LLM, but " ~$8.40 per 1k records" sounds unsustainable.
link
edit: I see you're using an LLM, but " ~$8.40 per 1k records" sounds unsustainable.