Hacker News new | ask | show | jobs
by jacques_chester 2316 days ago
There's a whole subfield of information science dedicated to basically this exact problem: entity resolution.

Hilariously, it has dozens of names, because it just comes up in so many places for so many people. It appears that "record linkage" is the term that has won the top spot at Wikipedia: https://en.wikipedia.org/wiki/Record_linkage

1 comments

Record linkage seems to be unrelated. While OP isn't sure how to segregate and join data, he has perfect joining capability through unique indices.

Record linkage seems to be concerned with joins that aren't guaranteed to be correct because there are no unique keys.