Hacker News new | ask | show | jobs
by trhway 1776 days ago
the rule customization is match rules and index generation rules, and all these customizations are data source specific. And a large company with a lot of departments and divisions, some of them being former acquisitions, would have a number of different data sources, as well some external ones too, reference ones in particular. Beside the pure basic issue of connecting those data sources and pulling data from it, the data in them have different reliability/quality, different standard of maintenance, etc. as well as different role in the system - some of those sources are master sources, some are only used to extract matches without update, some are [also] consumers of the match/deduplication results, so they have to be updated back. As as result any of such actually implemented ER systems is totally one-off.
1 comments

Thanks a lot. Sounds similar to the issues we had in our previous company. We provide some basic ETL functionalities to tackle these. Will be interessting to see where this will eventually end up and how much our solution does there or where it is better to use the probably in those companies existing ETL tools.