Hacker News new | ask | show | jobs
by rfc 2493 days ago
I took up issue with "Great Hack" as well. At some of my previous companies, we evaluated purchasing data from a lot of these vendors for a data hydration purposes, Cambridge Analytica included. They didn't offer to sell but we did have conversations around leveraging their platform to create insights.

What was funny to me in the whole process was that CA was the LEAST of what worried me. We were talking to Acxiom as well in which I could buy 500-1500 data points on 300M Americans for $250k-$500k. This included info like types of bank accounts, types of CC rewards, mortgages left, restaurant chain preferences, etc. They also had their own methods for creating data (the ML sauce) which created psychographic profiles.

The other thing the general public doesn't realize about profile data is that it suffers from sporadic and episodic contributions, making accurate high-resolution profiles difficult to obtain. There's lots of deduplication, profile merging, etc. that needs to be there.

Sure, CA had some shady shit going on. But from my perspective, they were tame relative to some of the other big players.

3 comments

Finally, someone else talking about Acxiom!

You want a bad guy, with no regard for privacy, making crazy inferences (and getting a lot wrong)? That’s your guy.

Yep! I was shocked when I got the data packet. I had no idea that I could get that much data on individuals.

Some other notes about the data: you can get additional hydrations 2-4x/yr @ $50k per pull. The data can be passed via API but they expressed to me that MOST data is doing via SFTP in Excel spreadsheets. They purchase data from any and all vendors possible.

Another insane data provide is Datalogix by Oracle. They were some of the first to have a deep relationship with Facebook. They do identity merges across all the Oracle touch points. This includes POS data, auto data (including sensitive data), and their overall marketing cloud.

It's insane to me that these other guys are not front and center in this debate.

I'm still undecided if the most shocking part of all that is how bad their data is, dedup-wise; how ancient is their tech; how unethical it all is; or how much every journalist or researcher working on AI and Ethics could not care less about this.
What sucks is that I get the reason why the researchers don't care or gloss over it. It's why I initially got into the space: there's something magical about working on massive data sets and unlocking possibilities with it.

I worked on the largest social data set available from the major providers (~20PB or so) and it was super cool (from a PM perspective) to unlock the possibilities around analyzing the data set.

The idea that I can unlock insights and change behaviors is an alluring concept until it is used improperly or inappropriately. That was ultimately why I left the data/marketing world.

When is using data on people to manipulate their behaviors _not_ inappropriate? That's straight up evil.
When they opt in. Going to the gym more, stopping smoking (or other bad habits), driving safer, cutting down on media consumption.

Unfortunately positive change is harder to elicit than negative change.

There’s are companies trying to help you lose bad habits that let you program alerts, set budgets, etc.

- Freedom, RescueTime do that for your web-use;

- my bank (Monzo) and a stealth company by alumni thereof do it for your spending.

There have also been a lot of efforts at Facebook to show you content that would lead you to have more positive interactions, like posting similar things yourself rather than be a passive spectator.

I’m involved in several of those projects.

If I used data to find people who were starting to lean into anti-vax conspiracies and provided them with accurate information about vaccines to change the behaviour of some parents - is that straight up evil?

How about identifying people who are likely to fall for a scam (e.g. whose friends have just invested in a Ponzi scheme) and give them info on how to avoid a scam?

Straight up evil is massively overplaying it.

Slightly different but I'm worried about Pilgrim by FourSquare which is now being used in increasingly large apps such as SnapChat
Thanks. As someone who isn't familiar with the ads industry, who are / were other bigger players apart from Acxiom?
The ones I'm most familiar with are: Acxiom, Datalogix, DataSift, Equifax, Full Contact, and Experian. There's always Twitter as well but they are mostly abstracted data and you can't just waltz in and get their firehose.

Most of these are pay to play. IIRC, when Facebook was creating their ads platform in like 06' or 07', they did a deep partnership with DataLogix where DL was effectively powering their targeting/user classification. That's largely why you still see dumb ad classifications (eg. Farmers vs. Non-Farmers) in the FB ads platform. DataLogix started out as a CPG play since they powered the "put your phone number in for groceries" platform that everyone used in the mid 2000's. They then started doing tons of data aggregations and purchased a whole bunch. DL become way more powerful when they were bought by Oracle in 14' since they could do identity resolution across their cloud. Eg. when you go buy a car, you are likely using the Oracle Auto suite which includes credit checks.

Exactly. For example, there are also persistent rumours that somebody is selling about 1% of total credit card transactions to hedge funds, for stock trading purposes - which seems like a major issue on all sorts of levels.
1%? Deidentified transaction data from banks is routinely sold https://www.abc.net.au/news/2019-03-05/sportsbet-documents-r...
If it's in aggregate and applied well, it can be immensely valuable. Some Capital One fraud researchers used similar data to get 1800% ROI a few years back:

https://www.bloomberg.com/opinion/articles/2015-01-23/capita...

Everyone with a credit card agreed to it. It's in the fine print. Don't want to be tracked shopping? Use cash and keep your cell phone turned off. And no modern cars with wireless comms either.