Hacker News new | ask | show | jobs
by SebiH 2662 days ago
> We have put a lot of effort into making this library useful to you. To help us make this library even better, it collects ANONYMOUS error messages and usage statistics. See d6tcollect for details including how to disable collection. Collection is asynchronous and doesn't impact your code in any way.

That seems really out of place. I'm somewhat used to automatic data collection from applications, but automatic data collection from programming libraries / frameworks? Really?

6 comments

I have a strong, negative reaction to this. I read the collection code (in d6collect), and it does as they claim (with perhaps minor qualms about what anonymous really means). And (for now) it's easy to disable without mucking around in the code. But in fact I'm not sure what I was really looking for, since I don't imagine using this library when I might need to reverify that they still aren't collecting anything I don't want to be collected.

On the other hand, I'm glad that they mentioned it --- I would have a much more negative reaction if I had to find this out on my own.

I feel like an even better approach here would have been for the developers to offer the data collection functionality as a separately installed module, and then make the case to the user during installation of the main package.
This need to be opt in or banned. Or else anyone not collecting data will have an disadvantage, meaning sooner or later every library will be collecting data. And it doesn't stop at collecting usage statistics, some popular software are already recording what web sites you visit, and what you search for! (For example how to do x in library y, so they can improve their documentation or what not)
This library has a GPDR consent problem-- just putting it at the end of the README doesn't cut it. I wonder if this is the first open source library to violate GPDR.
gdpr really only cares about personally identifiable information, afaik. You don't need consent for anonymized stats.
If you read what it sends they’re sending function names / kwargs / module names etc Put an IP in there (for example) or a persons name etc and you have potential GDPRviolation.
A while back I found out that the popular Serverless framework/library tracks and reports back usage (https://serverless.com/framework/docs/providers/aws/cli-refe...). This similarly struck me as really out of place, and (at the time at least) it didn't seem sufficiently disclosed or described in the docs. If I NPM install it and invoke it, have I implicitly agreed to this?
Also interesting to see what happens when such a library becomes nested within a more popular library. Disabling should still work, but fewer people would be aware.
I thought this was a neat package and would have tested it. I won’t install or use it because of this.

I don’t like packages that require external access to function. I understand the business mode and think there are clear ways to do this (plotly and graphistry come to mind), but I don’t think the benefit outweighs the downsides to use these types of libraries.

At least this can be easily refactored out, plotly and graphistry don’t really function well without the api calls. Plotly offline exists, but trying to keep track of features between the two is a pain. And the reasoning given for the api (massive scale conpute) could be easily abstracted for local mode if they wanted.

Actually, Plotly offline works exactly the same was as Plotly online: same code runs on both ends, total feature parity from Python :)