|
|
|
|
|
by chiscript
2336 days ago
|
|
I see... There are other solutions that claim to use A.I to train or generate models for their their apps. I'm not really sure how effective they are though.
That said, however, Flookup can help you flag dupes quite well. There are many ways I tried to shore up the fact that no algorithm is a one-size-fits all solution.
For example, Flookup allows you to dictate what stop words to remove or combine lookup variables for more specificity or even return the next best match in case the first one isn't to your liking.
All this makes it quite malleable and usable for a case like yours. |
|
Specific models might be an interesting addon. Address parsing, normalization, and deduplication (with potential covariates like phone number, email address, etc.) is a massive pain in the ass for any data engineer who works with sales or marketing folks. Their databases (CRMs) are awful -- it was always a chore to clean these up, but measurably saved money (imagine you mail physical cards, and only want 1 per customer... but you have 5 different contacts at that company for 3 unique individuals).
I would have paid for a deduplication service -- say, quarterly batches at somewhere >$500/quarter for e.g. 20-50k contacts.
The 1-size-fits-all isn't really a value add for me, that wasn't so much my issue. For other target users, I can see that use -- for them, the interface is the value add. Especially if you can read/write Excel files directly.
Stop words aren't something I used in my deduplication efforts. How many of your users request or use this? What kind of stop words do you want to exclude from comparing two entries? I would be worried that stopwords still carry information: "The Store" versus "Store" might be significant.