Any distro that phones home with a unique identifier is a distro I won't touch with a ten foot pole. I don't care what they claim they will or won't use that identifier for.
Maxims that act on the symptom rather than the problem rarely help in the end, as the problem just evolves to support its needs through other means.
For example, sending a unique identifier is not the problem. Tracking people through a unique identifier is. So, depending on your goals, you can design a unique identifier system that does not allow tracking (or at least makes the tracking period so small as to be unuseful for purposes other than designed) as outlined in the article through changing the identifier on the client side weekly.
If all you want to do is get a good estimate of how many users use what types of configurations of your software (major and minor version), a UUID that rotates weeks on the client side is perfectly acceptable to use for those statistics to a fair degree of accuracy.
On the other end of the spectrum, people long ago started reducing their trackable footprint online, and the online tracking ecosystem just evolved to finding people through other, trickier methods, such as browser fingerprinting.
You're right in general, of course. But here's the reason for my hardline stance on that: history shows that trusting promises or assertions made about things like unique identifiers is unwise, and so I have to take a strong defensive stance.
> you can design a unique identifier system that does not allow tracking
You can (sortof), but we run against that trust issue again. If I'm giving a unique identifier to someone, I have no way of knowing if their assertions about its use are accurate. Even if they are, there's no guarantee that won't change in the future.
> If all you want to do is get a good estimate of how many users use what types of configurations of your software (major and minor version)
You're talking about the perspective of the publisher. I'm talking about my perspective as a user. A company's "need" to collect metrics is their problem, not mine. If their solution results in more information disclosure than I'm comfortable with (and a unique identifier absolutely is), then I will avoid their software or block communications to their home base.
> A company's "need" to collect metrics is their problem, not mine.
When it's couched in how to deliver software updated, it becomes your problem as well. That's a transaction, and they want to charge more for it now. You can decide it's too costly, as you indicate here, but it's not like they're giving nothing in return.
I think it's important to note the goals of those involved. In this case, it's the people that put together a free product for us to use and also supply free timely software updates looking for more information on who is using what so they can do a better job at delivering that free stuff to us.
And in this case, it's not adding tracking where it doesn't exist, it's making it better for the specific cases that are useful to them and that impact users the least (an accounting of software configurations). They already track through IP address, but that's inaccurate to a much larger degree for the information they want (but somewhat less so for the personal information you likely want to protect). Adding an additional system that allows better tracking of the useful information without increasing the personally identifying features of IP based tracking (which still exists) is laudable, in my eyes.
> When it's couched in how to deliver software updated, it becomes your problem as well.
I honestly don't see how. If/when I'm ready to take an update, I can come get it myself. If they want to charge me (or charge me more) for it, then they can do so at that time. No tracking needed except for that associated with payment.
> Adding an additional system that allows better tracking of the useful information without increasing the personally identifying features of IP based tracking (which still exists) is laudable, in my eyes.
Not as laudable as not engaging in tracking in the first place. However, I don't see how this doesn't increase personally identifying features. On the contrary, it's adding one: a unique identifier.
> If they want to charge me (or charge me more) for it, then they can do so at that time. No tracking needed except for that associated with payment.
That's what's proposed? An identifier sent along with the request to see the current list of updates available?
> I don't see how this doesn't increase personally identifying features. On the contrary, it's adding one: a unique identifier.
An identifier that changes every week or so. At that point it is useless for identifying an individual, but can still be used statistically to determine how many systems are running what versions of Fedora, even behind NAT gateways. The only difference from before is now instead of "there's one IP with more than average check-ins, or check-ins from two or more different configurations", it's "there's one IP with X number of unique identifiers that randomize weekly seen over the last 28 days, so we can approximate X/4 different systems behind that IP".
Yes, I understand, but your explanation isn't reassuring to me. It's confirming that I actually do understand the mechanism and its ramifications.
Red Hat can do whatever it likes (although my take on it is that they're not likely to do this unique identifier thing). I'm not saying otherwise -- that's their right, after all.
All I am saying is that software that does this sort of thing is unacceptable to me and I will avoid it to the best of my ability. As is my right.
You said free so many times that I had to share some news I learned earlier today: the 'free as in freedom' podcast is releasing new episodes again after 2-3 years hiatus!
> A company's "need" to collect metrics is their problem, not mine.
And your need to run an OS on your computer is your problem, not theirs. What do you do if everyone on the sell side of the market uses telemetry? Just stop using computers?
> What do you do if everyone on the sell side of the market uses telemetry? Just stop using computers?
Well, that's not going to happen. I doubt Slackware would go down that road, for example.
But lets say that what you assert happens -- all that means is that I won't use distros. It doesn't mean that I won't use computers.
It's entirely possible to install Linux without using a distro or prebuilt binaries at all. It's also possible to keep using an older version of the operating system.
But, being essentially lazy, what I'd most likely do is an extension of what I do with with most applications these days: firewall off the servers that the OS is trying to communicate with.
There are reasons to draw a line in the sand, to say that even attempting to do some things is contrary to a strong norm that we will defend even if you promise that you're not using it for anything malicious, something which is hard to police.
Taking a strong stand against tracking and, therefore, in favor of privacy is perfectly reasonable for people who use Linux in part due to our hatred of the deep tracking closed-source OSes do.
The problem with drawing lines in the sand is that you trip up all the players that make an effort to act responsibly as well, thus reducing the incentive to act responsibly.
You're basically reducing market effectiveness by ignoring the details of available information and grouping unalike things together. The market will likely respond by reducing access to or the clarity of that information *e.g. they'll track you, but hide it even if it's innocuous and the vast majority would have no problem in what info is given up because apparently the people can't be bothered to make a decision on anything but the coarsest of details).
You speak of "tracking" as if it's all the same thing. Every sale you make at a store is tracked, and for good reason to both the customer and the store (how else do you allow returns). Every time you visit a doctor, they add the info regarding your visit to a log. That's tracking. Tracking itself is not bad.
Tracking individuals and personal information about them while they are trying to remain anonymous or have no expectation anything peraonal has been revealed is bad.
Attacking anything with the word tracking in it because it's been conflate with this even though it shares little or no resemblance and can't be used later for this purpose it it's current form is just FUD and an indicator or how broken human communication fundamentally is.
> Every time you visit a doctor, they add the info regarding your visit to a log.
JohnFen already said most of what I'd say about these examples, but I want to add one big thing:
The tracking the medical world does is controlled by law. Laws people take very, very seriously. It therefore can't be mixed with other data through being resold or in any other fashion to help form a more accurate picture of me.
That data re-use is part of why I want strong norms against data collection.
> You speak of "tracking" as if it's all the same thing.
True, and that's bad of me. I'm speaking in shorthand.
> Every sale you make at a store is tracked
But the store does not track me if I don't use a card. Returns are handled through the receipt that they give me during the transaction. That's a kind of tracking, but tracking the transaction itself, not me.
> Every time you visit a doctor, they add the info regarding your visit to a log. That's tracking. Tracking itself is not bad.
Indeed, and here's where I'll try to introduce the shades of gray I left out. I consent to the doctor tracking me to that extent (but I would object strongly if the doctor started keeping track of my whereabouts or what I was doing). The doctor even gives me a consent form affirming that. If I'm not OK with the tracking, I don't see that doctor. Software is no different in this sense.
I oppose tracking that I don't give affirmative consent for. In the case of Red Hat's purpose, I will not give such consent, as the cost/benefit ratio is not sufficiently weighted to the "benefit" side.
> is just FUD and an indicator or how broken human communication fundamentally is.
It's not FUD, as I'm not claiming that Red Hat is intending to do anything nefarious. And I don't see this as a human communication problem.
Speaking personally, this is a reaction to the trend in software and online to engage in massive amounts of user tracking and data collection, both disclosed and undisclosed, that has resulted in real harm (both intentional and unintentional).
Once bitten, twice shy and all of that. This is a problem that comes from real misbehavior of software companies, not from poor communications.
I would think if it rotated on the first of each month, that would probably be sufficient... then you could get your counts for any given month (excluding first/last day) assuming most system check every week or two at least, and it would be pretty consistent.
Rotating the identifier means you lose the information about attrition rate.
If you have some number of users leaving, but a similar number incoming, then it would look like you have a consistent usage. Losing the info about lost users means you don't improve in retention.
Could you regain this info by adding a static prefix to the rolled id? So you know it was rolled, but not from which previous id. Where as new id's would have no prefix, so you can count new users as new.
Could just be the date of install in UTC as a prefix, the other part randomized on the first of the month... they could still calculate relative drop off, and still get better stats more anonymously.
Later on in the article they describe a revised solution that doesn't do that:
> Poettering came up with a scheme that alleviated most of the problems that were identified. He proposed that a "countme" flag simply be added to a single mirror-list query each week. The sum of all such queries over a week's time should provide an accurate estimate of the number of Fedora systems. That way, UUIDs need not be stored, which removes much of the concern—data that is not stored cannot be misused.
For example, sending a unique identifier is not the problem. Tracking people through a unique identifier is. So, depending on your goals, you can design a unique identifier system that does not allow tracking (or at least makes the tracking period so small as to be unuseful for purposes other than designed) as outlined in the article through changing the identifier on the client side weekly.
If all you want to do is get a good estimate of how many users use what types of configurations of your software (major and minor version), a UUID that rotates weeks on the client side is perfectly acceptable to use for those statistics to a fair degree of accuracy.
On the other end of the spectrum, people long ago started reducing their trackable footprint online, and the online tracking ecosystem just evolved to finding people through other, trickier methods, such as browser fingerprinting.