Hacker News new | ask | show | jobs
by cachestash 2200 days ago
Can anyone explain to me why I should be concerned about vs code telemetry? I have zero personal information in the IDE and all the code I work on is already in the public domain with an open source license, so why should I care?
11 comments

How right or wrong it is to collect information about people and what they do is not entirely determined by how much personal information there is to get.

Just like how right or wrong it is to break into someones house is not entirely determined by what they take, or how much personal information you have in your house.

But yes, if you don't have anything valuable in your house, maybe you don't need to be concerned about people breaking in, but that doesn't make it more right.

Such analogies really depend on what the telemetry contains (granted, I don't know in this case). I.e. you describe it as stealing from inside the house. Say, VSCode figuring out your email adress and reporting it. But it might as well actually be pretty anonymous non-personal data i.e. more like looking at the outside of a house and taking note of that. Say, VSCode reporting the theme you use, and just that.
> Say, VSCode reporting the theme you use, and just that.

Wouldn't that be like looking through the windows to see what clothes they wear or at what time they go to bed or who they sit at the table with?

Just because you are outside of someone's home it doesn't mean you can't invade on their privacy.

Why does an editor need to do telemetry to begin with? Mind your own business. If you don't want to give me something for free, don't. Don't make something free and then feel like you have the right to spy on me. Charge me for using it.

Wouldn't that be like looking through the windows to see what clothes they wear or at what time they go to bed or who they sit at the table with?

Hehe with 'theme' I was actually just thinking about the color of the outer walls, should have made that clear. Even then: passing by a house and peeking through a window once is usually legal and as far as I know doesn't require consent (at least not in my country). Stopping in front of the window and staring inside for hours, not so much. Nor is passing by at the exact same time everyday and peeking inside.

Why does an editor need to do telemetry to begin with? Mind your own business

I'm not advocating in any direction here, be it pro or against telemetry. My point merely is that before making analogies, you'd better first check what telemetry exactly is being collected otherwise your analogy might be completely off.

Telemetry is commonly used for statistics on how a product is used, to learn for example if the user interface is counter intuitive or obnoxious with a much better sample size than their own team.

I also note that we still don’t have a clue in this discussion what the telemetry even includes despite tools and probably even documentation detailing this already existing. But I guess it’s more fun to debate this from a philosophical standpoint than the product in question.

From everything I’ve read, MS (and other telemetry-happy developers like JetBrains) see telemetry as a way to gather statistics about UI interaction patterns and popularity of this or that component, so that they can prioritise development towards the most popular features and problems. It’s a bit like our good old “Popularity Contest” package in Linux, but applied to UI elements and behaviours. It’s the sort of thing that resulted in a massive Paste button on the first Office ribbon, because telemetry said Paste was the most used feature.

Sometimes I wonder if I should turn all telemetry on, so that they’d have a datapoint that actually matches my workflow rather than Joe Schmoe’s. It’s a bit invasive though, and culturally speaking is a horrible model (“I can’t change things unless you let me watch what you do all the time”).

Obviously if you care about privacy you should keep telemetry off, but to be honest, if you don’t trust Microsoft to respect common decency about private code, you just shouldn’t use a tool they built in the first place. I use JetBrains tools and trust them enough to leave telemetry on (“voting” for my preferred features, effectively). If you do any politically-sensitive work, though, you should absolutely stay the hell away - because then it doesn’t matter what they do with it today, but what they could do if they wanted (i.e. under pressure from authorities).

Obviously if you care about privacy you should keep telemetry off, but to be honest, if you don’t trust Microsoft to respect common decency about private code, you just shouldn’t use a tool they built in the first place.

Isn't that the point of today's discussion: you can use the privacy-respecting alternative instead?

I also note that we still don’t have a clue in this discussion what the telemetry even includes despite tools and probably even documentation detailing this already existing.

Unless there are clear statements and guarantees that nothing will change in future updates without the user's express consent, the statements themselves aren't worth much anyway in this kind of discussion.

Microsoft's privacy policies are notoriously opaque, to the point where you'll have trouble verifying that, for example, they aren't granting themselves the right to upload your source code. This observation almost invariably attracts downvotes, but anyone who thinks I'm exaggerating can easily refute the point by citing the places in Microsoft's documentation that say otherwise and guarantee not to change that in the future.

My point is that it's not just what information is being taken that matters. Even if you don't take anything it's still wrong to do things inside someone's computer or home without permission. Maybe VS Code asks for permission properly and gives the user sufficient control and has good defaults, I don't know, but it's difficult to do that properly because people don't understand computers as well as they understand the rest of life.
This (Vscode) is a product for developers, so are you saying that if you use VScode you don’t understand computers?

That’s very arrogant of you

By "computers" I really meant "the complexity in the software that computers run". Computers do very complex things and most of those things can't be directly seen. So if you want to tell someone about what some software is doing, it's not always easy.
Because some other people might be working with code that is not in the public domain, or code whose mere existence should be kept hidden (so even relatively innocent things like project, file or branch names should be kept secret)?

I personally prefer tools that don't spy on me. In 99% of cases I probably won't care, but I don't want to take the chance of the 1% where a telemetry request would send out something I'd rather keep private which is why I want tools that are private by design.

My screwdriver doesn't spy on me and report what kinds of screws I use it with, the hammer doesn't either, I want my text editor to behave in the same safe and predictable manner.

As an application developer it's quite frustrating to be left completely in the dark about how people actually use my applications. All I can do is guess. Those guesses are most probably incorrect and the app won't be as good as it could.

Just a simple button click heat map would be very useful info to have. But then sending click heat map is the same thing as stealing credit card info in the minds of many ...

> to be left completely in the dark about how people actually use my applications

You don't have to be left in the dark. You can ask people for feedback (yes that used to be a thing) or run user testing sessions (yes that used to be a thing too but seemingly not anymore when we look at the quality of modern software).

> the app won't be as good as it could

I have yet to see any evidence that telemetry improves software quality enough to warrant the privacy trade-off. If there is a correlation it seems to be opposed; telemetry started becoming popular in the last decade, and the last decade is also the time around which software started declining in quality or usability (see Windows 8+, certain changes to macOS and iOS, bloated or user-hostile websites, etc).

> Just a simple button click heat map would be very useful info to have.

That heatmap thing will also at least leak my IP address, software version and a persistent UID that will allow the backend server (whether self-hosted, or powered by a nasty ad-tech company like Google analytics) to keep a log of my IP changes and usage patterns.

> You can ask people for feedback (yes that is still a thing)

That's not very reliable. It's quite common behavior that people give feedback only when they are not happy so you can get feedback like "this is horrible" although it still works nicely for the silent 99%.

> or run user testing sessions (yes that used to be a thing too but seemingly not anymore when we look at the quality of modern software).

Difficult to do for projects with $0 budget. I'm also interested in the long term (experienced) users behavior which is not possible with such testing sessions.

> That heatmap thing will also at least leak my IP address, software version and a persistent UID that will allow the backend server (whether self-hosted, or powered by a nasty ad-tech company like Google analytics) to keep a log of my IP changes and usage patterns.

* IP address - I don't care about your IP, that does not give me any useful info

* software version - sure, I'd like to know which version you run. Is that really privacy violation though?

* persistent UID - that's a matter of discussion, for me what's important is behavior within one session, connecting several sessions is not so important and I could do without it, so no persistent UID

Each of these items could be a matter of discussion - it would be nice to move the discussion from "all telemetry is literally evil" to "what's acceptable to collect?".

I disagree with feedback not being reliable. I think that detailed feedback from someone being not happy give you more details than a heatmap for example. I also think that feedback from a user who takes the time to actually leave feedback (and so is more invested in the product, and likely to give you repeat business) might be more valuable than one-off users.

> I'm also interested in the long term (experienced) users behavior which is not possible with such testing sessions.

Is it not possible to reach out to those users and invite them to such a session in exchange of $$$?

> I don't care about your IP, that does not give me any useful info

True but some malicious third-parties might care, whether it's the analytics service itself (Google Analytics comes to mind) or even a law enforcement request to capture/access such data. You are basically creating a potential liability for the user; some people might not want the software to phone home for certain reasons and I think the default should always be safe so telemetry is "off" by default.

There's also the issue that telemetry is typically opaque and the user has no visibility or control over what is sent, so out of an abundance of caution they opt out. I think a good improvement would be to queue all the telemetry data locally, and then periodically ask the user to review, edit/redact & send it if they want to. Apple has done it relatively well there where if an app crashes they allow you to review the report before sending it, and I actually send these the majority of the time (unless it's a process dealing with sensitive data) despite having OS-level telemetry disabled.

> I think that detailed feedback from someone being not happy give you more details than a heatmap for example.

Detailed feedback is definitely nice, but it's quite rare & not sufficient. It's again one person's view, people also often can't articulate what's wrong. Usage patterns across many users may reveal what's wrong ...

> that feedback from a user who takes the time to actually leave feedback might be more valuable than one-off users.

Both are valuable - one-off users might be people who got confused enough to be discouraged from using the product. That's extremely useful info.

> Is it not possible to reach out to those users and invite them to such a session in exchange of $$$?

Impossible for projects with $0 budget.

It's also very unreliable since people working on artificial test data have very different behavior than when they are working on their production data.

This gets brought up all the time and what I can't figure out is why user interfaces have gotten worse over the last 10 years, even as developers have gathered unprecedented amounts of information from telemetry. Is the most common feedback developers get from telemetry "good, but needs more whitespace"?
> what I can't figure out is why user interfaces have gotten worse over the last 10 years

I don't know, but I'm pretty sure telemetry is not making UI worse.

My point is it's not making it better. So what am I, a humble user, getting in return for sending potentially sensitive data to third parties?
Neat comment id, by the way (2345678)
Telemetry tells your ISP and national military your usage patterns, too. When, where, and how often you use the tools is itself private.

Imagine a private journal that reported to the government every time you wrote in it, and what city you were in when you did so.

Furthermore, VS Code is specialized software. Using it in certain places allows a specific user to be tracked and identified out of millions of more "normal" traffic patterns, as developers are still a tiny minority in society.

> Can anyone explain to me why I should be concerned about vs code telemetry?

why do you close the door when you go to the toilets? It's not like what you do in there is really not known.

Because it makes others uncomfortable if you don't. Wrong analogy.
Because maybe developers in your bank or hospital are using IDEs with nasty telemetry that might expose data on you? Maybe they are editing a branch called "workaround-for-mr-cachestash-bankrupcy-account-bug"?
Never say never, but still, this is super unlikely and not at all what telemetry is.

They report on things like button usage, time spent in app etc. Possibly personal information about the developer, although unlikely.

But your code? No no, they're probably not looking at your buggy code...

Microsoft has been known to do things like log all command line arguments in dotnet, for example https://docs.microsoft.com/en-us/dotnet/core/tools/telemetry

And then they post the results publicly. https://devblogs.microsoft.com/dotnet/what-weve-learned-from...

You can see the telemetry events VS Code sends: https://code.visualstudio.com/docs/getstarted/telemetry#_out...

Does anything here seem malicious?

Indeed. And since they are very open as to what they collect, feel free to point out anything disturbing.

As for the publicly posted results, it seems to me like they try to understand who uses their products and how. That's not worse than basic website analytics, and that's data they certainly need in order to prioritise their development.

But I may be missing your point. Can you point to a specific datapoint in those documents that you object to, and explain why?

When they post full csv files with all kinds of command line arguments including typos, who's to know your "dotnet run fix-aspycts-bankrupcy-account" command, where you forgot the first "run" argument, won't end up in a csv some day?
Except that they don't seem to be recording the arguments to `dotnet run`: https://docs.microsoft.com/en-us/dotnet/core/tools/telemetry...
and how do you know that? did you check the source? oh wait..
And how exactly do you think they would look at all the source code in the world? Humans are too expensive.

So probably AI. How would AI tell the difference between valuable banking software full of bugs and your side project full of bugs?

Also from there, why would you include a customer's personal information in your code, or even have it on your own machine?

Really, the chances of vscode's telemetry leaking personal user information is super extra low, unless you're obviously doing something wrong with your code.

Ah, and finally, if you're using github, they have a much more efficient way of getting your code anyway.

I value silence in my network traffic.
It's interesting, if we look at the size of webpages in everyday browsing, which can go from tens of megabytes to a few kilobytes when blocking tracking/analytics scripts.

I wonder what would be the back of the napkin calculations for network traffic and energy savings (local and server side) of regulating tracking and telemetry?

Is there an environmental case to be made against modern web practices on tracking and telemetry?

I've really come to dislike Google over the past decade or so, but I do like that their Speedtests, Lighthouse etc don't hide this fact from you.

Pretty much all sites I've been asked to look at were getting low scores because of Google Tag Manager, Adsense and the like. It has a very measurable impact, and yeah, removing it speeds up the page.

The environmental case will probably not fly for regulation, but it just might in public shaming of large companies. "Hey, $company, your usage of $trackingTech uses as much power per year as an average family of four. Is that really in line with your green approach?"

Thank you!

This is exactly one of the reasoning pillars i'm using in arguments about the "innocuous" nature of telemetry and tracking.

Any new product that collects telemetry/does tracking requires storage that is bought and connected to a power source with high availability given it performs I/O all the time.

I agree to that. I also do not care too much about the telemetry, but silence (on the network not the telemetry) should be the default.
>>I have zero personal information in the IDE and all the code I work on is already in the public domain with an open source license, so why should I care?

Off the top of my head: API tokens and other credentials that live right in the file while you develop and debug.

Those are quite sensitive. To put more wight on the issue[0]

[0] https://medium.com/@stestagg/stealing-secrets-from-developer...

Since when does the telemetry even send out any contents of any files?

That's called stealing not telemetry.

Since when does the telemetry even send out any contents of any files?

Since pretty much forever? At the very least, many programs with built-in telemetry have included things like memory dumps of key areas at the time of a crash, which could include data the user was working on at the time.

More seriously, I invite you to read Microsoft's extensive privacy policies and try to satisfy yourself that they don't grant themselves the right to upload your code. They are sufficiently nebulous and ambiguous that they could probably be interpreted that way.

>> Since when does the telemetry even send out any contents of any files?

To me it's not evident otherwise until i'm capable of observing bare data itself. Not obfuscated, not in some proprietary format to secure it in-transport but the raw stuff. MS states there's no reliable way to let people see the data being collected (even under GDPR) as there's no sing-in experience provided.[0]

While a part of the statement is true, most of privacy conscious VSC users aware that every installation of the product has a unique `machineId` property. Can be located at Output -> Log (Shared).

The [0] provides some elaboration:

"We do send information that helps us approximate a single user for diagnostic purposes (this is based on a hash of the network adapter NIC) but this is not guaranteed to be unique. For example, virtual machines (VMs) often rotate NIC IDs or allocate from a pool. This technique is sufficient to help us when working through problems, but it is not reliable enough for us to 'provide your data'."

So, given the premise the user can be identified by a NIC plus a machineId (which looks to be an UUID) — it's easy to get access to collected data. As soon as ability to verify no really critical data is collected, i'll switch back from VSCodium.

[0]https://code.visualstudio.com/docs/getstarted/telemetry

FYI, to see what's being send do this: "F1 > Log Level > Trace" and then "View > Output > Log (Telemetry)"
I choose to trust MS are using the telemetry for improving VS Code, which I genuinely love, and accept the "risk" that MS somehow abuses the telemetry. I see this risk way lower than e.g. Google or Facebook abusing data they can collect about me.
If you have nothing to hide you have nothing to fear.
Google/Apple maps where you live and go, Facebook knows who you deal with, Microsoft knows how you use your computer and Amazon knows what you buy.

It's ok if you're a boring type for your whole life.