Hacker News new | ask | show | jobs
by cupofpython 1406 days ago
having worked in the data industry, this sounds about right. Digital fingerprinting is certainly real, but I was way more paranoid about what I thought companies knew about me before working in the industry. the data quality across the board is dogshit. Even for the best companies doing B2B data like D&B and Zoominfo which are talked about as being better than most of the others - it's still mostly dirt.

Data right now is typically bought and sold with an expectation that most of it is crap. it's faster to buy and process 5000 dirty items that probably has a few good leads buried within it than to find leads manually / naturally or broadcast random advertising. (I left the industry in 2020 and my NDA expired in 2021)

Data quality is typically assessed at the "Does this data field have a value for this line item" level. That means data vendors are financially incentivized to make shit up about you as much as they can get away with. think about it for a second, these companies are selling themselves as the source of truth. the actual accuracy does not matter, and the better you are then the less data your customers buy. the data goes stale faster than the accuracy of the data becomes relevant

Did you like a post about a fresh baked baguette that had #french as one of the 100 tags associated with it? congrats, you're french now. it's not exactly this ridiculous, but you get my point

2 comments

If you purchase a data source, how do you verify how good it is? Or do people typically just not do that?
you can verify how full it is.

there are some verification focused services - like they take a list of emails and check if they are valid email addresses. Some use fine print to say they are only validating whether or not it is of valid email address FORMATTING, and make no claim about whether or not the email will bounce. verifying if the email address actually belongs to the person it claims to is not part of the deal.

it's nearly an impossible task, because you have no actual source of truth to verify it against. So data vendor A and B give you different results for the same search - now what? you have to manually research and see whos "right" or "more recent".

even if it looks like good data, it might be stale. For example, company size, revenue, C level email addresses, etc all change over time.

so if a customer wants cleaner data - you basically charge them to pump the dataset through Mechanical Turks or upwork or something to have people try to verify things manually. Datasets can be large though and this gets expensive, so it tends to be better to just buy the crap data for cheaper and figure it out yourself

I have a conspiracy theory that these verification services are behind a lot of the phone spam today. they are just checking if your phone number is valid, they dont actually care if you answer.

> data vendors are financially incentivized to make shit up about you as much as they can get away with

Exactly this. But they can get away with basically anything. Worst case for them is they show you a premium ad you aren’t interested in. Best case is they guess correctly