Hacker News new | ask | show | jobs
by throwawaybutwhy 1354 days ago
The package phones home. One has to set an env var or fix several lines of code to prevent that.
3 comments

I think their plan was/is to make money on corporate licenses and support, as well as SaaS/cloud products.
They won't, they can make investor money back only from selling company to Amazon/Microsoft/Google but in this economy it won't happen.
Hey, yes, we've decided to keep it opt-out for now and it collects fully anonymized basic statistics. Here is the full policy: https://dvc.org/doc/user-guide/analytics .

It should be easy to opt-out though `dvc config core.analytics false` or an env variable `DVC_ANALYTICS=False`.

Could you please clarify about the `several lines of code`? We were trying to make it very open and visible what we collect (it prints a large message when it starts) + make it easy to disable it.

This seems pretty anti user since most users prefer opt in. Seems pretty shady to keep in behavior that users don’t like and potentially harms them (you think it’s fully anonymized).

That’s your prerogative as it’s your project but makes me think what else you’re doing that’s against users best interest and in your own.

We are fully aware that it raises concerns. Trust me it hurts my feelings as well. E.g. on the websites (dvc.org, cml.dev, etc) - we don't use any cookies, GA, etc.

We've tried to make it as open as possible - code is available (its open source), we write openly about this at the very start, we have a policy online, made it easy to opt-out. If you have other ideas how to make it even more friendly, more visible, etc - let us know please.

Still, we've preferred so far to keep it opt-out since it's crucial for us to see major product trends (which features are being used more, product growth MoM etc). Opt-in at this stage realistically won't give us this information.

Yet there are many successful projects that don’t collect this information. So it’s not crucial for them but is crucial for you.

I think the challenge I have is that since you’re getting IP address that will be an opportunity to abuse. And there seems to be some rule that any data that can be misused will eventually be misused.

Since you’re not willing to make it opt-in, I think perhaps the only other way would be to support an automated distro that doesn’t include it so users are at least able to easily choose a version.

I admire you for responding to this thread and me as it’s definitely not easy. I just feel like one of the main benefits of open source is its alignment with user benefits so it’s discouraging when an open source project chooses code that users don’t want.

Right, many projects use opt-in, there are many that have opt-out though:

https://docs.brew.sh/Analytics https://docs.npmjs.com/policies/privacy#how-does-npm-collect... VS Code, etc

> I think the challenge I have is that since you’re getting IP address that will be an opportunity to abuse.

Yes! And we are migrating to the new package / infrastructure because of this - https://github.com/iterative/telemetry-python (DVC's sister tool MLEM is already on it and it's not sending (saving) IP addresses, nor using GA or any other third-party tools, data is saved into BigQuery and eventually we'll make publicly accessible - https://mlem.ai/doc/user-guide/analytics to be fully GDPR compatible). It's a legacy system that DVC had in place. There was no intention to use those IP addresses in some way.

> I think perhaps the only other way would be to support an automated distro that doesn’t include it so users are at least able to easily choose a version.

Thanks. To some extent brew-like policy (not sending anything significant before there is a chance to disable it and there is clear explicit message) should be mitigating this, but I'll check if it works this way now and if it can be improved.

I wonder what the GDPR implications of this are. I note other projects (for eg Cura) switched their telemetry to opt-in.

https://github.com/Ultimaker/Cura/issues/2810