Hacker News new | ask | show | jobs
How to track users for analytics in a privacy-first, cookie-less future (narrator.ai)
65 points by mattjstar 1835 days ago
18 comments

We do it a bit differently (French company). Since the only cookie that is endangered is the « third party cookie », it is very much ok to store anonymous session information in a first party cookie for all anonymous visitors. So we store page views and utm there, and capture this data in the datawarehouse when (and only when) there is a conversion. This is also working with returning visitors (who most likely kept the first party cookie).
Note that you still need to get consent for the cookie in this case, as the cookie is being used for something which isn't strictly necessary to provide the service.
I think this is kind of interesting question actually. If this cookie is entirely separated from the rest of the experience (e.g. _never_ gets associated to a logged in cookie, or IP address, etc.), is it really tracking the user? It's more like tracking article association. I agree it's not strictly necessary to provide the service, but is it necessarily tracking users at all? Another similar approach would be to keep the clients IP address as a similar key, but in that case the IP address can often be used to (at least closely) identify the client, but if the UUID is randomly generated it's a bit different.

I mean my gut feeling is that you're correct, but I kind of wonder about this case.

edit: A cursory reading of this site makes me think you are correct:

https://www.privacypolicies.com/blog/eu-cookie-law/

If the cookie is never used until a later date e.g. conversion, when the user clicks through an agreement, do you still need consent?

Edit: I honestly have no idea, I haven't read the regulations and I'm curious if any experts know. Seems sleazy regardless!

What does it mean? Even a session cookie is used at a later date, e.g. 5 minutes later. The law does not specify minimum retention time.
What if the cookie is also used for feature toggles?
Love it! This is super similar but we allow many different layers of conversions. So anytime we bridge systems you can connect the user. This is critical with the many browsers, devices and so on. In e-commerce we see that emails have such high conversion rates but they are often opened on the phone then the user buys on the browser without clicking. These situations are captured by doing the many layers of identity resolutions
> Since the only cookie that is endangered is the « third party cookie »

Data protection regulations (esp. GDPR) are totally unconcerned with the distinction between first and third party cookies. They are concerned with data collection permissions and scopes, regardless of the technology used.

If you are capturing information which is not essential to the service/product you are offering at that moment and in that session, then you need specific permission - even for your own cookies. And if you did not have that permission at the time it was collected then you cannot merge it into records after conversion.

So the overall concept of "shove a tracker value into the URL and collate all interactions" makes sense - but how do you track if a user is sharing a URL?

Let's say that I'm on a desktop browsing a shopping site. I'm on shopping.site/product/coolthing.html?tracker=12345. I share this with my friend on a mobile device because it looks like something of interest to them.

Now how do you handle the other person having the same tracker as the initial person? You end up with a scenario where two different people, with different interests, are browsing the site. Even if they convert you have situations along the lines of: no conversions, person A converts, person B converts, both convert. How do you handle this?

With cookies the sharing of the URL would avoid this scenario since cookies would be separated between people.

Use the referer, is my guess. Whereas browsers are starting to limit the sharing of referers, I don't think there are any plans to limit first party referers. So as long as the referer is your domain, you know the user navigated from within the site. Otherwise, the user followed a link offsite, likely shared by a friend. You can then assign a different tracking id to them.
> Whereas browsers are starting to limit the sharing of referrers

AFAIK, they are starting to standardize the sharing of referrer. For most people, this is limiting the sharing. However, they used to have an easy to find GUI option to just opt out of referrers in general. That is no longer the case.

This could in theory happen, but in my examples I'm adding the url right after someone converted -- paid for a subscription or completed an order. Those are unlikely to be shared with someone else (ideally). It's arguably more likely that the user will share it with themselves on another device, in which case the overall approach will work well.

I should also point out that the url tracker isn't meant to be persisted across page views. It's only done once at the moment that the user identifies themselves to your service.

Then I'm a little lost. I had thought a big part of this (your "Stitch anonymous data to users once they convert" picture and around it) was to be able to backtrack anonymous users once they identify themselves.

Even if they identify themselves via ordering something, is it an unusual workflow to share a link after? For example "I got this new coffee, I'm excited, here's the link to what I ordered my friend!"

Well your tracking a user via the Anonymous id. Once you see a link (checkout url, order link, form submission, etc) you create a link. Now you have a list of cookies, their linked email at a moment in time. Then you create a table that has the cookie and who it maps to from a timestamp to a timestamp. This is then used to update the past and future identities. Think multi-user, multi-device in time.

So in the example you gave, the user who opens that links becomes tied to that cookie from the time they open the order to the next linked event. This is really critical because it will continue to stitch the users identity over time.

If link sharing is happening a lot, you can choose to not use that linkage foe identity resolution.

Does this help clarify the approach?

> Said another way, our customers are able to stitch historical anonymous data to 95% of their converted users–this is even in the last few weeks after all the Apple device and browser privacy updates.

When companies talk about how they anonymize data before uploading it to the cloud, so it can’t be traced back to you, they should get sued. It is so easy to connect data to their sources with just a little bit more information

> If you're unable to set a consistent cookie across your user's many sessions (especially for a high retention business like e-commerce), or your javascript conversion events (Google Tag Manager for example) are being blocked, your user's historical behavior will be extremely difficult to stitch together over time.

Yes, that is in fact the point.

Look, I know there are strong financial incentives to build individual user profiles and doing it this way may not violate the letter of the law, but it sure as hell violates the spirit. If we ask a user if they're willing to be tracked and they do everything in their power to tell us no then I'm not sure how comfortable we should be doing it anyway.

Yeah, this advice looks targeted to companies that benefit hugely from targeting their users.

If I'm reading correctly it's basically saying 'once a user has identified themselves to you, then you can go back and figure out the steps they took before that'

As a person, if a company knows what I did right before I bought their product (say in that session) I think I'm ok with that. If they follow me onto other websites or other devices then that feels a lot more invasive.

The reality is that the people doing the tracking are the real users. What a boring relationship to the network - tracking and spying, and grifting.
This is so wrong.

All data that is collected whilst a user is anonymous was done so under the condition of anonymity. Breaking that anonymity by assigning unknown-user data to the now-known user is retroactively changing that user's consent without getting their agreement. Like saying "I know you chose not to be tracked, but now we've had some interaction with you we don't think you meant it". But on what basis?

Not only is this morally/ethically incorrect, it is probably illegal as it is a clear violation of data collection laws. Consent was not given for those prior activities to be tracked. Current consent does not change that.

Edit: Even the suggestion that the stitching together of the data could use the non-PII that was obtained does not get around the fact that permission was not given and by joining the sessions/activity that way you would in fact be de-anonymising (non-PII gets associated with PII).

Will privacy be the future? I feel browser tracking will simply become less relevant in the future. I can't explain why. I'm sure of one thing though: the government will know more about us.
I think of this stuff separately.

1. The government can always use the excuse of "public safety." However,

2. A company that's selling brooms shouldn't have all kinds of access to tracking people. The public doesn't benefit from that.

The government barely knew enough about us to make a website that could connect people to healthcare. Forgive me for being more concerned about what Google can feed into DeepMind Person Predictor in 2026 to know about me.
I mean at the end of the day whether you store it in the cookie or the URL it’s still persistent key value storage for tracking purposes, so I don’t see why the EU’s stance would be any different. It’s effectively still a cookie.

Some activities and cookies are allowed by GDPR without requesting consent, and anonymous analytics (even google analytics) is included in this, so you don’t actually even need a cookie banner to do what you’re trying to do here...

I think from a legal standpoint this is no better than cookies, it doesn’t change whether you need consent or not.

In addition, if you are using Saas products like Shopify, you already have unique urls for carts and orders so you can also use that without any additional engineering.
Honestly putting the cookie as a param is pretty clever. The challenge with cookies is there are often many cookies that point to one person. Imbedded browsers, multiple devices, cookies being refreshed by browsers. So seeing the full journey requires stitching all the cookies
I'm mildly surprised cookie consent banners are only at 20%, given how often I come across banners who's only option is "Yes, I consent"
I block the entire element, and if moderate actions aren't enough will frequently just move on to other search results.
I've found more often than not they can just be ignored?
Me too. If I ignore or decline, then the next pageview will show the same banner. It's nuts, but I guess it's new normal.
Then fight fire with fire. Remove the banner.

Ultimately, the site is not allowed to collect data without your consent, but many sites illegally set cookies or store your data in logs before you consent, or even if you decline. With cookies, it is easy to detect bad actors, but not so much with logging.

Rephrase: how to track someone even after they have not given you the consent to.

And how wait to for their one mistake to retrace all their steps and then identify their history without their consent

---------

Track me on your website alright. Tracking me everywhere without my consent? You're a creep and nothing less

Is it hard in practice to figure out who the anonymous ids are? I'm used to just having Segment identify calls.
Good question, the idea in the post is once you know who the user is you make sure they load a page with a unique identifier on it that you can use to identify them.

As an example, think of a Shopify check out flow. Every user has a unique checkout url. Once they purchase you can use that checkout ID in your warehouse to join with the page view that had the anonymous Id on it. So you’ll have a page view with the anonymous Id with a url with a unique checkout Id that you can use to join to the ultimate identified user (assuming all your page view and Shopify data are in one place, your data warehouse).

Let me know if I understood your question!

Matt, I'm curious to know why?

If a person has jumped through hoops to say they don't want to be tracked, why look for ways to still do it?

It's like putting up curtains to keep people from looking in my window, but then you realize you can still see inside if you crouch down really low and look through the 1/8" space between the bottom of the curtain and the window sill.

This puts me in a difficult position of supporting legislation that might be overly harsh like the Do Not Call list back in the 1990s. It pretty much killed off telemarketing for a while. Sad, but the public got fed up with an entire industry that showed it had no regard for the public.

I've got friends who are in marketing and ad-tech and I care about them, their business and success. But when I want to withdraw my consent to be tracked, I want that same level of care and respect.

This is really great. I think it is about what and how one is being tracked. I think people are trying to stop companies from tracking them everywhere but if I am interacting with a single company, I know they are tracking some things. Foe example, if I buy off of an e-Comm website then they should have tracked my order so I can go back to it.

This extends to support. I hate submitting a support ticket to just be told to go view the docs that I already viewed, or try things I have already done. As a customer, I want you to know I did that and to help me.

All this is trying to do is allow company to understand their customer within their own platform and own data.

This is not trying to fingerprint users or track everything they do on all websites like Facebook and other platforms.

I hope this clears up the motivation. In short, it is to resolve issues in using the data that a company has internally and is not about creeping on users across the internet.

Ok. Thank you. This does help. I can appreciate that.
Well you can still track unique users without collecting PII. The information sent with every web request by default is still pretty useful. That's how we do it at https://www.hockeystack.com, no need for all this work.
"Step 3 - Attribute anonymous page views to the user!" - not GDPR compliant without consent for that.
How to... um no you don't you use APIs that have OAUTH type interactions for data sharing permissions with explicit user consent for everything. Just do it. It's the right thing, it's what people want, and it's a clean technical solution.
An International GDPR seems increasingly necessary. Stop trying to right size, hide prices, etc; just give straight good deals and sell good products at good prices.
How is a company supposed to determine what 'good' prices are though? Should they just be covering their costs? Should they just be covering their costs + x%? Sure they could base it off what the rest of the market is doing but where is the market establishing their prices?

I don't disagree that price gouging occurs in many markets - and I agree that an international GDPR would be beneficial to people - but good pricing also doesn't just occur naturally it's a product of experimentation.

Good prices are what consumers determine are good prices.

Sometimes that's razor-thin margins. Sometimes it lets you get away with high margins.

Isn't this even more creepy?
Yes. It is.
Identity Resolution via warehouse is the future!!! I love this!
Author here - we've been able to identify anonymous users pretty consistently once they convert to becoming users. This talks about our approach and how to do it, while still following all the rules around tracking cookies, etc…
Why do you talk about consent with regards to cookies only? GDPR deals with so much more with regards to tracking and identifiable information.

For example this quote from the article: "Add a unique identifier to all urls on your site when you know who the user is."

I don't see how our legal would allow us to do this with European customers without explicit opt-in consent since this kind of tracking and data processing cannot be deemed a legitimate requirement for the core function of the service.

If the same service can be given to the visitor without the unique identifier in the URL, then I see no way to avoid asking for consent.

https://gdpr.eu/recital-30-online-identifiers-for-profiling-...

Because most people haven't read GDPR or similar laws, and play it by the ear. Considering GDPR is often called the cookie law means people who dom't read the law, and don't hire lawyers end up doing things like this.

What is the EU going to do anyway? I've yet to see any meaningfull challenge from EU about GDPR.

The identifier on the urls isn't meant to identify the actual user I think.

If you look at the examples given they're more like identifiers to something else -- an order id or subscription id.

Wouldn't tracking something like an order (but not the user directly) be ok with GDPR?

They are using (in the example) an order number as a proxy to identify and track the actual user. From the article: "Simply look up the user from the identifier, note the anonymous id, and replace the anonymous id with a real user in the data."

At this point the tracking of the online identifier has certainly passed the threshold into tracking an individual for reasons not directly related to the service.

https://gdpr.eu/article-4-definitions/

"1. ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;"

The order number in this case falls under "an identification number" and "an online identifier" at the very least.

"2. ‘processing’ means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction;"

What is happening is at the very least processing, recording, storing, dissemination, combination of that data.

A company may store both customer data and order data and keep them under GDPR, because a particular customer provided it knowingly. The important piece is when a customer asks to be removed, the company must remove their customer data (e.g. their name and address) but the order information can remain orphaned in order to do analyses on revenue, orders, etc. The right to be forgotten is ONLY about customer data, not related anonymized identifiers that tie back to the previous customer's order history.
Actually even the personal details associated with the order often must be kept even if a person requests their removal. The GDPR doesn’t trump other financial, consumer protection, and anti-fraud laws.

Example: if you buy a lawnmower, the seller may he required to notify you of any safety recalls for many years (depending on location). GDPR does not change this requirement for saving personal contact data with the order data, even if the buyer later says “forget me”.