|
|
|
|
|
by RobinL
718 days ago
|
|
The Fellegi Sunter model is able to estimate the importance of different types of information from the data itself (i.e. unsupervised learning). For instance, a match on a date of birth column lends a greater weight of evidence in favour of a match than a match on first name (since dob has higher cardinality). The method is also able to estimate weights for fuzzy matches (how much evidence in favour of a match is close match on dob with one character difference), and also how much evidence against a match a mismatch is. For instance, if you have very high data quality on gender, then a match on gender doesn't tell you much, but a mismatch on gender is quite strong evidence against the idea two records match. I have a blog post here that delves into this a bit more:
https://www.robinlinacre.com/fellegi_sunter_accuracy/ |
|