Hacker News new | ask | show | jobs
by slg 1730 days ago
>At some point we have to accept that this data actually helps them fix real issues too

It is strange that this community of all communities has such a resistance to this idea. Software developers should know better than anyone how difficult it is to identify and fix vague software problems without having specific details about the problem. Yes, there is a negotiation between the value of telemetry and privacy and often too much privacy is sacrificed. But I am always surprised to hear developers say all telemetry is bad.

7 comments

There are at least two distinct issues here this community is concerned about:

1. Telemetry is unethical if the users didn't provide informed consent (opt-in) for it. It's not just a theoretical point - anyone who's worked in tech sector for a while should know most companies cannot be trusted to behave ethically (especially if they took VC funding).

2. There's a certain dysfunction/antipattern that's popular in tech sector, called being a "data-driven company". It's the practice of making decisions through divination from data collected through extensive telemetry, to the exclusion of other knowledge sources (like e.g. actually talking to your users, hallway testing, or thinking things through). This leads to software being optimized in questionable directions - so in a sense, you could say that adding telemetry implies an increased chance the software will become worse over time.

Worse in a really insidious way too. Optimizing based on engagement for instance could be maxxing the amount of time people spend using a service, while minning unseen variables people care about like their emotional state while engaged. It’s sort of an inevitable thing that optimizers do to things they don’t measure, and it’s an extremely difficult problem to put everything people care about into the equation. Like for instance, think of how horrible a polynomial fit gets for a function just outside the window you are fitting as you add more terms.
> But last week, Facebook revealed that it had manipulated the news feeds of over half a million randomly selected users to change the number of positive and negative posts they saw. It was part of a psychological study to examine how emotions can be spread on social media.

Doing such a study would be the first step in optimizing for a good emotional state. But it (quite understandably) led to an outcry which stopped it dead in its tracks.

https://www.nytimes.com/2014/06/30/technology/facebook-tinke...

Sentiment analysis will tell you people have positive feelings about cute animals and negative feelings about friends getting cancer. An extremely difficult problem to put everything people care about into the equation.
I think it's way more complicated than that. Like you could measure how much better someone feels after seeing cute animals. Short term happiness isn't the only goal and needs to be traded of with other things, both as informing the user about what their friends are doing (you probably do want to know if they get cancer even if learning about it is sad), trending topics so you are clued into what everyone else is talking about, and of course ads. Learning more about the numbers allows better tradeoffs.
The impact of missing important personal news isn't quantifiable in practice. Different people want different things from the platform too. And Facebook can't put everything people care about into the equation as long as the real goal remains getting them to look at more ads.
> Telemetry is unethical if the users didn't provide informed consent (opt-in) for it.

And if the only way to opt out is not use the product. Or stop using it particularly.

>There's a certain dysfunction/antipattern that's popular in tech sector, called being a "data-driven company".

This is sarcasm. Right?

I (like most here) have personally benefitted from extensive software telemetry but am generally against it in my personal life. But I also write business software where the expectation of privacy is already moot.

Gaming is a weird point though, telemetrics are already in heavy use, the privacy risk is indeed minimal, and gamers don't seem to care anyway though. I can't tell you how many GDC talks I've watched that discussed player heatmaps, incident (like death) reports, and whatnot used to fine-tune game balance.

There are numerous places where telemetry is completely inappropriate, like one's operating system. An idle computer should indeed be 100% idle, internal housekeeping exempted. (I recently installed freebsd on a new server, did some setup, and basked in the glory of htop showing 32 cores at 0.0% and a root process list that was under a page long. I wish other operating systems could follow that example)

> freebsd on a new server

It always makes me happy to see how short the list returned from `ps aux` is with FreeBSD. Whereas if you go onto a typical Linux box (even just a raspberry pi!) and run that, you get at least a screenful of processes doing who knows what. (Just my small experience.)

This is also the experience with Ubuntu on Windows WSL:

  $ ps aux
  USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  root         1  3.5  0.0   8944   332 ?        Ssl  11:38   0:00 /init
  root         7  0.0  0.0   8944   228 tty1     Ss   11:38   0:00 /init
  dando        8  1.7  0.0  16804  3396 tty1     S    11:38   0:00 -bash
  dando       32  0.0  0.0  17392  1916 tty1     R    11:38   0:00 ps aux
A strange irony that the most purist Unix-like Linux is to be found in the belly of the Windows beast, for the equivalent of those hundreds of processes doing who knows what on a typical native Linux install are instead all in Windows Task Manager.
Tip: If a process is a service-host, run `tasklist /svc` and that way you can actually know what's going on in each process.
If you install a server distro, the number of default processes is pretty minimal. Run GNOME on FreeBSD and you'll get the same huge list of processes.
Not so sure about that, my last fresh install of Ubuntu Server (admittedly like 4-5 years ago now) had several cores hovering in the 20-30% range while idle and nothing was installed on the server yet. "Ooop, guess I'm going back to debian"
> If you install a server distro, the number of default processes is pretty minimal.

It depends on which processes. If you include kernel threads (which show as processes), the number of processes can get pretty huge, especially on many-core servers (several of these kernel threads are per-core).

I don't know whether on FreeBSD kernel threads show up as separate processes; if they don't, it might explain part of the difference.

I totally agree. Telemetry is invaluable to making software better.

Transparency is key here. If projects explained the steps taken to anonymize the data (either provably using Differential Privacy, or approximated via some other means), I feel like people might trust them more. Even with DP, though, the server does see IP addresses, even if it doesn't know what the telemetry is, and that alone might cross the line for some people. Even if the project promises to not log them.

The problem with explanations and promises is that there's a lot of bad water that has flowed under those bridges. They amount to just saying "trust me".

As a dev, telemetry makes me nervous because so many projects rely on it too heavily and make bad design decisions because the telemetry blinds them.

Absolutely. There are steps that software can take to build this trust, though, like

- Have a screen that shows all telemetry that is going to be or has been sent

- Ask for permission to send any telemetry, or certain types of telemetry (i.e. crash reports)

– Publicly share the collected telemetry.

> I totally agree. Telemetry is invaluable to making software better.

Can you give 1(one) example of a program which was improved by using telemetry ? And no, trashing the UI in the name of change or modernism does not count.

Thank you.

We introduced some opt-in telemetry in dolphin-emu.org several years ago. I remember several times we discovered things we would likely have completely missed otherwise:

- We found out by looking at the distribution of software version that we had a strong holdout on one specific revision. It turns out we had a regression in a niche feature which was very important to a sub-community of our users, and users were basically telling each other to just use that old version. No bug report was filed until we found out via analytics and asked.

- We have a "game quirks" mechanism where the emulator reports weird edge cases that happen very rarely. Current list: https://github.com/dolphin-emu/dolphin/blob/master/Source/Co..., example usage: https://github.com/dolphin-emu/dolphin/blob/ffdc8538a162b1ca... . We used this to find games that use currently unimplemented or stubbed features.

- The list of popular games being played on the emulator was extremely surprising because it turns out there's a huge disconnect in what most NA/EU players are playing and what JP players are playing. This led to us adding a bunch of new games to the list we regularly test for performance and stability regressions. Would you have guessed that Inazuma Eleven GO: Strikers 2013 is in the top10 of emulated games on Dolphin?

This is a fascinating answer that has persuaded me to your side. I recall opting out of Dolphin telemetry because I simply couldn't be bothered to check what would be sent, but seeing not only examples of what data is sent but also how it's used in such a positive way will definitely have me turning the telemetry on next time I go to use it.
We have a privacy policy which describes what we collect in a bit more details, see https://dolphin-emu.org/docs/privacy/

https://github.com/dolphin-emu/dolphin/blob/master/Source/Co... is the actual code which collects most of the information. We do multiple things to avoid being able to track user activity too much -- for example, while every instance of Dolphin has a unique ID so we can do things like unique counts, events that happen within a play session are associated to truncated_hash(unique ID + game ID) and not directly with the unique ID. This means that we can only correlate events from the same user playing the same game, but not* two events from one user playing different games.

* Our implementation is a bit weak given that the set of all gameids is small and enumerable. We could probably do better there.

The word “telemetry” can mean a lot more than “detailed usage data to inform design decisions”

I’m talking extremely basic things, like, what are the most popular crashes in the app? Which did I introduce in the most recent version? Why is there an increase in end to end latency in fetching data from the server? Stuff that would fall under “bug fixes and improvements” that you would likely not notice.

The word telemetry became common when it became more than optional crash reports and update checks.

The last example sounds detailed enough. And some privacy considerations are moot when the app is a client for your server.

Windows.
You mean that calc.exe opening in 3 seconds is better than instantly. Or dissapearing ribbon and title bar on windows in a multimonitor setup is better than before. Or the new designed shutdown menu.Or the white fonts on light background. Or dissapearing scrollbars because someone thought that this is a good idea. Or the new save dialog when you need 3 clicks just to be able to select the folder where you want to save.
Windows went downhill fast since introduction of telemetry.
Windows 10 is better than any previous version of Windows. So how did it go downhill?
User experience. Windows 10 was an upgrade over 8, which was a huge downgrade over 7. Not only did telemetry not help Microsoft improve the UX of their OS after Windows 7, in Windows 10 it's been getting worse with each update, and Windows 11 is slated to be a disaster.
Windows has been getting worse for the last 3 or so generations of Windows.
Windows 10 is better than any previous version. I mean my Windows laptop is far more stable than any distro I run on my desktop and work laptop.
How? I mean, in what way, and in response to which metrics?
Office.
I completely agree. However, Microsoft has demonstrated user-hostile tactics in the telemetry field so people don't feel inclined to trust them.
It helps that when you have a front row seat at a butcher's house, you become somewhat averse to eating meat as part of your regular diet. From that perspective, it is really not that surprising.
I like this analogy, let's run with it even further! Can the aversion be caused by watching butchers that kill animals in an inhumane way, in a dirty shed?

(i.e. if you observe telemetry data being abused at the company you work for and that is tolerated, you'll be wary of any telemetry?)

It’s the difference between always sending something and a pop up asking if you want to send a bug report.
How about performance issues and silent errors? Finding areas that need to me optimized because a lot of users have subtle issues kinda needs passive data collection.

As long as it is anonymous, it's a good thing, IMO.

Yup. A previous company I worked at used assert calls in the code. In a debug build of the firmware, if an assert failed, it would crash the device and on the display it would show the file and line number of the assert call. In production firmware, it would silently ignore it, and if you had opted in to sending usage data, it would phone home and report the failed assert, though I don't know what extra data was included.
The key still being "if you had opted in".
> As long as it is anonymous, it's a good thing, IMO.

As long as users give informed consent, then it's acceptable.

> It is strange that this community of all communities has such a resistance to this idea.

I don't think many people here have any resistance to this. It's simply true. I think a lot of people, though, think that benefit isn't sufficient to overcome the drawbacks.

Because the visual difference between abusive telemetry and benign or useful telemetry is 0. Same with disclosures.

And we are a very abused culture sitting in the middle of what I hope is peak surveillance capitalism.

Beat someone with a stick enough, and when you go to scratch your back and they flinch, it's not sensible to deride them for being irrational.