|
|
|
|
|
by alkonaut
972 days ago
|
|
Unless there is any PII associated with the pseudonym, there is nothing specifically in GDPR that says you can’t or shouldn’t do this so long as it’s not information that can identify a physical person. Note that being able to attribute multiple pieces of data to the same anonymous person does not necessarily identify them (and it’s important to not accidentally do so): It’s important though if you e.g have multiple products to use a _different_ pseudonymization (hash salt or whatever) otherwise you run the risk of storing data linking too much data on a user thereby de-pseudonymizing them in the worst case even though no individual app does. Having a users behavior across multiple applications could pose such a risk in extreme cases. Edit: I think it's important to separate "hashing" and "hashing". A properly hashed identifier uses a salt that is generated on the client, so that it can't be used to identify the user. basically: the first time the app runs, you generate a random salt which is only stored on the client, and NEVER sent in telemetry. Anything you would like to transmit over the wire that would risk identifying the user (E.g. a computer name, mac address) you hash with this local salt. This way no one can try to go to the database on the server side and try to match any data e.g. check if the hash abc123 matches the computername jimbob bcause hash("jimbob")= abc123. Just sending hash(MacAddress) without a local random salt would NOT be properly pseudonymous because an attacker on the server side could ask and answer the the question "Does this come from the address macaddress?". |
|
I think the massive amounts of behaviour analysis Microsoft does should be considered PII. They know when you turn in visual studio in the morning, and when you leave. They know when you go to lunch and don't click any buttons for a while, and they can see the colleagues with you in that boring meeting also not clicking any buttons at the same time. This type of behaviour analysis over time can associate you and the people you interact with, even if it's not directly tied to a reversible hardware ID.
This is why pseudonymisation isn't anonymisation, and why pseudonymisation isn't sufficient to comply with laws liker he GDPR.
If the behaviour analysis was done without identifiers at all, you could say they're just counting button clicks, but they intentionally associate this data with your stable personal identifier for analysis over time.
MAC addresses aren't that big of a collision space either, any consumer GPU can generate a list of all hardware MAC addresses in use in a reasonable amount of time. MAC addresses may theoretically be 2^48 in size, but most of the space hasn't been assigned to vendors yet. It takes about 12 minutes to reverse any given MAC address when you rent a single cloud GPU. The double hashing should take about twice that time.
The weird thing is that Microsoft intentionally chose to use a MAC address rather than a UUID like they use on their web version. If this was just a unique user token, they wouldn't need to use any hardware identifiers at all.