Hacker News new | ask | show | jobs
by hendrik_tilores 1375 days ago
I think the point there is that you want to have clean data. Sure for you numbers it would look better if these are bigger - like for twitter and the bots... However you also need to see the other side - the operational one. If we have the same customer 5 times in your data, you will also target that customer 5 times with the same marketing initiative, you will have 5 times the costs etc.

Coming to compliance it even gets worse. If you have to answer a GDPR DSAR and you have 5 different records for one person, but only show one, then you can get into serious trouble with the authorities and also pay high fines.

So I think less high quality data is worth more than a lot of trash data.

1 comments

here https://www.linkedin.com/feed/update/urn:li:activity:6931955... you can read more about the mentioned twitter ER problem.