Hacker News new | ask | show | jobs
by cosmie 2224 days ago
For purely analytics systems, it's already pretty easy to leverage server-side calls or an internal proxy service to route figures, which does truly mask where the data is going (but should still be disclosed in your privacy policy).

And many of them offer a hybrid approach called CNAME cloaking[1], where you CNAME a subdomain on your host to the analytics/marketing system. That way you still leverage their infrastructure but they gain access to the first-party context. Here[2] is an example of that for Adobe Analytics.

Google Analytics doesn't officially support a CNAME implementation, so the above doesn't apply. The cookie service I referenced in the parent comment is a workaround for that. The GA code is still all third-party, but you create an internally hosted microservice that will set cookies when called out to. You hit that service (even via a third party tool like Google Tag Manager), it sets an appropriately named first party cookie in it's response, and you suppress GA from triggering it's usual cookie-setting properties (which would overwrite the first party cookie and get hit with ITP restrictions).

That said, ad networks are a different beast entirely. Some of them offer a CNAME cloaking implementation option, but it's less common than you'd expect. And few if any of them allow internal proxying of the tracking data. There's simply too much potential for fraud and too little trust between parties. "Offline" conversion tracking is pretty commonly supported though, which involves having a site capture a click identifier that an ad network appends to an ad click, then in an out-of-band process, upload conversion activity to the ad network using that click id as a key. Precludes the super invasive browser-side tracking, while still allowing for attribution and media effectiveness analysis.

[1] https://dev.to/dnsadblock/cname-cloaking-or-how-are-we-being...

[2] https://docs.adobe.com/content/help/en/id-service/using/refe...

1 comments

How are you seeing companies solve for view through tracking in this landscape?
From which perspective?

For ad networks: the solve seems to be more disclosure from advertisers of customer PII[1]. All of the recent tracking prevention and cookie restriction measures are disruptive to last-mile analytics, but are far less disruptive to the overall strengths of device and identity graphs. Offline/out-of-band data feeds for click-through tracking don't require passing any identity data, since the click id acts as a key to connect the result to the associated click action. You don't have that for view-through, so instead you pass identity data and that's used to associate the result with the network's device/identity graph and attribute it to any relevant view through action that occurred. But because the advertiser is blind to which conversions/results may be relevant, they have to disclose all results and associated identity data in the process.

Caveat emptor: The above is based on my observations working in digital analytics in general, but my primary focus is in a different area. So there may be nuances or aspects that I'm not cognizant of.

For advertisers: The three main options tend to be either blissfully ignoring (or consciously accepting) the visibility gap, move towards the greater data disclosure of the above solution if you (and your lawyer) are comfortable doing so (which maintains the status quo for visibility while disclosing significantly more data to ad networks), or invest internally in the tech and resources to perform the tracking themselves (which gives the advertiser visibility, but keeps the ad network blind). For option three, you can self-host something like Snowplow[2] and abuse it as a poor-man's ad server for tracking purposes. The Cloudfront implementation model gives you the throughput and latency to allow you to use it as a view-through pixel, and you can then put it on any placements/networks that allow you to use an advertiser-provided tracking pixel.

[1] In hashed form, for what it's worth. The liability of disclosing raw PII is too black and white for comfort, but the security theater of obfuscated PII via hashing hasn't been tested in court well enough to put a dent in that practice.

[2] https://snowplowanalytics.com/

I was actually curious about all of them as I've been both buy and sell side. But I'd like to dig further on the buy side.

I've seen what happens with the consciously accepting route--not pretty. Furthers distrust of a channel many already have inherent concerns and varying degrees of understanding, which is a dangerous combo.

Giving the data is what I suspect many will do unless they have sufficient resources for said legal team and technology, or care a great deal about leaking data.

The last option is interesting. So I've used Snowplow (was actually an early user that sponsored them adding UTM remapping). I'm curious how you approached using it as a poor-man's ad server with Cloudfront which I'm less familiar with. Are there any technical write-ups you could point me to?

This was a big part of the reason we built our repo https://github.com/posthog/posthog to enable first party analytics. It'll grab UTM tags all the way through to individual user behavior in your app and then provides an analytics UX on top. Disclaimer: I'm one of the founders.
But can you do anything about view-through data?