Hacker News new | ask | show | jobs
by jesseryoung 1847 days ago
I've got a question for anybody who works in ad/marketing tech - is what Robert is describing something that you've worked on/with and seen successful results? If so did you build it that way intentionally? Like, I totally understand that it's possible, but has anybody intentionally built something that tracks or correlates peoples location so they can group them with similar interests and sell them similar products?

To me, the stupidest simplest solution is probably the most likely - some naive marketing analyst probably just grouped all traffic coming from the same IP address into the same bucket and blasted ads to them based on recent Amazon purchases at the same IP address.

12 comments

Yep, this is called cross-device matching. Generally consists of some modeling for devices seen together on the same IP address. One of the notable AdTech companies in cross-device modeling is Drawbridge (purchased by LinkedIn).

Here's a 2015 Kaggle competition that they hosted, which provides sample data that they use in modeling, https://www.kaggle.com/c/icdm-2015-drawbridge-cross-device-c...

And here's a technical writeup of one of the well-performing solutions from that competition, https://arxiv.org/pdf/1510.01175.pdf

Tapad are probably bigger than Drawbridge
It works, and it's awful. I was looking for a present for my girlfriend, and she started seeing ads for the things I was looking at. About a week later, she was excited to show me her new purchase... and I'm scrambling to find a new gift idea. And now I'm paranoid -- it seems that the only way to stop this is to make a cash purchase in meatspace.
Use Firefox and install uBlock Origin.

Your credit card company will still sell you out - but that does take a little more time, and will only include one item (rather than your entire browsing history) - meatspace cash is likely to help with that, but that’s much less of a problem in your context, I think.

I guess what really baffles me here is that a CC issuer is allowed to sell that data at all. Just. Wow.

I’m not trying to show off my European high horse, it’s not like we don’t have our own problems.

I use uBlock Origin and have an iphone. Unfortunately getting an adblocker is not as trivial, so inertia took over and I see ads on youtube app and I see ads when generally browsing the web on the iphone.

I notice that over the course of the last year either some really sophisticated newer algos are being put to use, or the collaboration and sharing of information between ad networks has been streamlined or increased in some manner because I'm being served ads that are creepily relevant. But in any case, the clues and data you leave behind, they're aplenty and quite suspect to being compromised and pounced on by ad networks. I think at this point if you wanna play tango, don't only just play defense (ad block), go on the offense as well and use adnauseam to pollute the profiles they've built of you.

I want to articulate as well the annoyance I feel when being served targeted ads: an ad, if it's related to my interests, even tangentially, it does grab me, and no doubt it probably compels me to make some decision one way or the other. Particularly, what gets me, I believe, is both the mental overload of being served ads of "relevant" things which will attract my attention too much and clutter my mind and distract me, and the sheer arrogance of pushing things it believes are relevant to my interests.

Adguard on iPhone works alright, hooks into the Safari blocker API. It's not as effective as a proper blocker on Android but it does improve the experience.

I also run my iOS devices over Wireguard when out and about to my home network which runs a pihole DNS server. Works surprisingly well and also catches ads in apps that way.

I took a picture of a friend's headphones on Snapchat that they had left in my car. In the next week I started seeing ads for that exact model, and they were distinctly identical. Not a fun user experience.
I have an iPhone with AdBlock Pro and I use NextDNS on all my devices. I almost never have any ad with NextDNS (paid version) so for me it works really well.

Sometimes it’s « annoying » because I click on links from articles and emails and they are blocked so I can choose to give up or disable NextDNS for this time but it’s my choice to be tracked

> have an iphone.

Well. I would postulate, that targeting iphone users would be numero uno priority at any self-respecting adtech company, since its a strong signal that marketing does in fact appeal to you more strongly and you likely have a lot of "spare change"...

Magic Lasso works well on iPhone/iPad, and so does Firefox Focus ; I have both installed, not sure how they divide the work, but I hardly ever see an ad in Safari or Firefox on iOS.

(They don’t stop YouTube from showing ads)

That won’t help with IP tracking. Buying presents from work sounds like a better option. Assuming we ever go back to work.
Ublock presumably will block the tracking code, if it's a third-party tracker.
Don't forget, a VPN, a new email account and a new phone number for "2fa". Also, where is it getting shipped? I can't receive packages at work. The "convenience" of shopping online is a legend from my youth
And by meatspace cash, it has to be pieces of paper and metal. If you use a debit card, the payment network knows anyway. And that might not even be good enough, if you carry your phone, they have your location at that time, so if someone really wanted to, it's probably not even hard to correlate the relatively rare cash purchase at that exact time and place and know it was you anyway.
Will Firefox and uBlock Origin prevent my IP address from being discovered? Sibling posts indicate this was probably accomplished via IP address targeting.
It will block all the 3rd parties that have to do anything with retargeting like Facebook, google, Adnexus, etc.

It’s unusual for sites to conspire directly and share data about IP (but that may change)

You could just buy a generic giftcard (like Amex one) and use it to make the actual purchase.
From the AmEx giftcard holders agreement:

> We also use Cardholder Information for marketing purposes and to conduct research and analysis. We may provide certain Cardholder Information to companies, including our affiliated companies that perform business operations or services, including marketing services, on our behalf. We may provide certain Cardholder Information to others outside of American Express as permitted by law, such as to government entities or other third parties in response to subpoenas. We may develop marketing programs and send you offer for products and services. We do not share customer addresses with other companies for them to market their own products and services.

https://assets.ctfassets.net/2x5vcnvffh4i/7it0e2T8WQ8fl4DmkL...

True. But if I buy a gift card with my regular CC, my CC record would have a line that says "purchased a gift card", but I don't think the actual gift card number would be there? And on the GC data there would be whatever I bought, but there's nothing linking the GC number to me, is it?
Yeah, I guess it depends on the level of data sharing and how good AmEx is at identifying its users based on other data points. Either way they are explicitly stating they're gathering data and passing it on in a much more standardized and defined form than they're willing to share with the consumer. Entities buying the data are probably throwing a lot of capital into joining datasets on a macro level.

Another comment mentions privacy.com as a solution. I've actually thought of creating a little terminal program to leverage it because it's a neat product and super cool they're maintaining a well-defined API for it.

All this ultimately bums me out though. Jumping through so many hoops to avoid this intrusive (and increasingly default) behavior can't be good for mental health. Plus where do you draw the line? When it's so widespread and largely unaccountable while everyone is saying it's up to the individual to avoid it, it really starts to feel rather quixotic trying to take measures to protect yourself.

You should suggest her to install uBlock Origin. Not just for that problem, in general it's good practice.
She's the IT expert of the house. I don't tell her what to install, or how to manage our network. If anything, I should put a pihole on my wishlist -- but even that wouldn't solve the problem that all of our metadata is correlated, and nothing blocks first-party tracking
> but even that wouldn't solve the problem that all of our metadata is correlated, and nothing blocks first-party tracking

That's true – nothing stops Google from knowing what you were looking for – but if your girlfriend was seeing ads, she wasn't using uBlock: because it blocks all first-party ads, too.

I think a much bigger problem here is that almost nobody uses Firefox for Mobile. Also, uBlock doesn't block ads across native apps (for instance, YouTube).

The solution is to use something like NextDNS as your DNS provider at OS or router level. At least on Android 9+ and most latest Linux distributions (via systemd-resolved) no additional software is required for it to work.

I don't see much of a difference between recommendations and ads, personally. And in this context, the distinction is moot. Ublock doesn't hide amazon recommendations, does it?
That's a good point. But in such a case, it's neither cross-site tracking, nor ads. It's just Amazon's recommendations based on a shared IP address.

uBlock can be used to block both, "Sponsored" products, and Amazon's recommendations. But it won't help when using Amazon's native apps – which many people probably do.

You either have to use a VPN to hide your IP address (Mullvad seems to be trusted even by Mozilla), or at least switch to your mobile 4G/5G connection when doing anything more privacy sensitive.

Reminds me when I was getting relentlessly retargeted ads to purchase something. So when I did, I paid cash to keep the ads coming and mitigate any attempts at offline attribution.

I’m guessing the present wasn’t a PiHole or VPN?

Yet another reason why ad blocking is ethical.
How does purchasing in a physical store prevents them from using your tracked online activities to target you and others ?
It doesn't, as you ask, prevent tracking of my online behavior. That's a lost cause. Cash allows me to hide select purchases, as long as I don't do any comparison shopping online. And that prevents disclosure of gift purchases to my housemates.
Yes. Many times, and at scale.

While not strictly accurate, it's easiest to think about it as a simple machine learning system. The system can't be interrogated, so you don't really know what correlations are being made.

The actual way it works is in layers. There's a human layer, using logic to create segments or other targeting methods. There's the ad network's automated optimisation options. FB really took this to the next level. There's retargeting. Bidding, and the economics of advertising plays a big role in giving the system intelligence. 3rd party ad management software.

Each piece/layer typically ads additional data to the set. The human/advertiser generally does this this by uploading or tagging their own customers. FB, for example, will allow you to create a "similar" list, where it finds user similar to those you designate. Similarity is somewhat ambiguous. FB/Adwords is where the heavy lifting happens, most commonly via bid optimisation.

The only intention is "goals per $." Price, and volume. As I said, the sausage factor is complex and no one sees the whole thing. In practical terms, a massive NN optimizing for sales/signups/etc itself is a decent analogy... and increasingly not an analogy.

Fascinating. Any clue as to what the largest factor for "similarity" is, and how much it contributes?
These tweets are a pretty decent sample, though I suspect these factors (association with other phones/users and such) are more active in bid optimisation than list generation. Hands off stuff is gradually overtaking the "hand coded" elements. These, I imagine, can take advantage of wider set of heuristics.

List generation is dumber, and feels more hand coded. Basic demographics, facebook/instagram interests.

How is this going to work with carrier grade NAT?

Edit: commercial to carrier, thanks justusthane

Just FYI, it's "carrier-grade NAT". And there are a lot of ways to associate people with each other other than their public IP address. The linked twitter thread doesn't even mention IP addresses, neither does the comment you responded to. I suspect IP addresses are already a pretty inaccurate way to link people with each other.
It likely doesn’t for IPv4 now that everyone has switched to HTTPS.

Many ISPs used to insert “client id” and other uniquely identifying information while NATting/proxying. Luckily, they can’t do that for https - but I wouldn’t put it beyond them to sell a back channel “connection xyz is unique user abc” service.

However, with the move to IPv6 , at least in my area, NAT is gone and static assignment is in. You just need to know the isp’s prefix length, and you get a unique identifier.

I think the question isn't so much how it works with that (as in you are pointing towards it just not working) and instead just how well it works with that.

Do you have numbers on how many consumers in say NA, various European countries etc are behind CGNs? I would guess most are used by mobile carriers (but I have no data) and I would gather that this particular technology is not going to be used to try and associate random mobile users anyway. It's more about who likely lives in the same household.

Google has that covered since many of those behind CGN are also on networks with native ipv6. Many mobile networks have already made the switch - dual stack to the handset with native ipv6 and ipv4 handled via CGN.

Googles interest in ipv6 isn’t entirely altruistic after all…

There are other identifiers that can work cross-device other than IP. Basically anything tied to you (identifying or not) that exists on both devices can be used.

Have you logged into a service or websites on multiple devices before? Then you voluntarily gave them enough data to link it.

Heuristics. Marketers do cross-device correlation only when seeing a small number of devices (ie. look like they could be part of the same household). If they see hundreds of devices behind the same IP it's probably a larger entity (ie. a company office).
My wife works with digital ads - on sale side, not tech - and the products that they offer in terms of geofencing goes something like this: they have a bucket of tracked people that went to a car show or a dodge dealer that they can then push ads from a local Toyota, or whomever her customer is. They further can track and determine how many of those people actually went to the said advertised dealer.

They did a compare for one dealer: out of 150 people that got pushed ads for the dealership 12 ended up buying cars there afterwards - on a higher $ purchase that’s pretty significant conversion.

Ads are a pretty good proxy for how much profit a sale is. While a car is a high $ purchase, moving your $40/month cellular plan from one provider to another is $thousands of profit loss for one provider and $thousands profit for another.
Did they compare to the similar group of people who did not get ads?
These days this is a basic offering of any adtech company and is full of quite a lot of BS.
> To me, the stupidest simplest solution is probably the most likely - some naive marketing analyst probably just grouped all traffic coming from the same IP address into the same bucket and blasted ads to them based on recent Amazon purchases at the same IP address.

I agree with this as well. I've been living back at my parents house for a bit while I'm between properties, and I definitely see ads for stuff targeted at my parents. Sometimes I worry there might be a privacy breach there, e.g. my Dad has been suffering from a condition recently and I've seen ads for coping with it come up on my computer, most likely based on his google searches or whatever.

Ultimately, all attempts at attribution are heuristic in nature. Marketers know a single IP doesn't represent a single person, but if it's the best they can do, it's the best they can do. Even for services with accounts, tracking can't be perfect. When my wife's phone or laptop is closer than mine and she's logged into Amazon Prime or Uber Eats, I'm ordering through her account. Now she's gonna see ads for gym equipment she has no interest in. Oh well. It doesn't matter how good your location, device, browser fingerprinting is when people share locations, devices, and browsers. The only way to know it's really me is to get my actual fingerprint or some other truly unique biometric identifier.
Let me tell you this: I'm from the Flemish part of Belgium. YouTube and other sites with ads can't figure out that I don't speak French.

So even with this simplest of use-cases: GPS says I live in Flemish part, never search in French, etc. Still they sometimes show me French ads, which is a total waste of course.

So I don't believe the tech is so crazy advanced already.

There are a surprising amount of people that think Belgium is majority French speaking. While I understand Belgium is tipping few foreign curricula with such trivia, I blame it for this default across the many services that do it wrong. Also, non-Belgians I meet rarely know two-thirds speak Dutch rather than French.
It's a little different. Targeting these days is more and more machine learning driven. So it's not really someone sitting down and saying "show an ad to anyone who stayed at a house with someone who bought this toothpaste". Rather, a bunch of data flows into Facebook and it uses those signals to decide who should see what. It's not a naive analyst. It's a statistical engine (and yes, that engine can sometimes be naive, and it's working off of really noisy data).

For example, any good Facebook marketer probably uses "lookalike audiences". You upload some existing customers and then tell Facebook to show ads to people who are "like" your customers. Facebook then used whatever data it has to find similar users (demographic, interest, geographic, behavior etc).

In fact, lookalikes can be so good that any good marketer also knows to _exclude_ existing customers from the lookalike audience (unless you're actually retargeting your existing customers).

> seen successful results

For about a week now, about 80% of the YouTube videos I've watched on my Android TV has been tampon ads, body hair removal machines (legs, not beard) and similar. My SO never uses this device nor the account.

I've disabled personalized ads, so YouTube tells me it's showing me these ads mainly due to time of day and the type of video I'm watching...

I can't be certain, but I'm pretty confident the number of tampon users watching videos about repairing parts for earth movers at 2am is rather low compared to those not using tampons...

So while others may have cracked the code, YouTube certainly has not.

When I was shopping for wedding rings, I started seeing ads for Peoples when streaming on tv and on the SO's own phone. This despite the fact I usually browse with an ad-blocker, no-script, and delete cookies. The tracking is remarkably invasive.
How do you know your SO wasn't getting ads for wedding rings simply because she was thinking about getting married too?
First the timeline, second that she confirmed she wasn't shopping around or investigating wedding-related items.
> To me, the stupidest simplest solution is probably the most likely - some naive marketing analyst probably just grouped all traffic coming from the same IP address into the same bucket and blasted ads to them based on recent Amazon purchases at the same IP address.

Yes, the dirty secret of basically all discussions about tracking on the internet is that IP+User-Agent is a pretty good baseline that is commonly used.

A lot of this is described in the book ‘The Age of Surveillance Capitalism’ by author Professor Shoshana Zuboff[1]. There’s also a good documentary on Netflix (I forget which, I think it’s ‘The Great Hack’[2]), explaining how the ‘Cambridge Analytica’ scandal utilised personal data and more importantly behaviour.

They’re just scarily good at predicting what you are going to do. They’re not listening in. It’s far scarier/more insidious than that.

[1] https://en.wikipedia.org/wiki/The_Age_of_Surveillance_Capita...

[2] https://www.netflix.com/gb/title/80117542

Why is this downvoted? It provides useful information with sources.