Hacker News new | ask | show | jobs
by aodj 1924 days ago
I was really interested in this as an easy way to help people get to grips with the intricacies of Pandas and working with dataframes, but after looking over the source code that's published on Pypi I don't think I can ever recommend this package to anyone due to the degree of tracking that's present in the code.

There's no mention of the Segment tracking in the docs, and I don't see anyway for the user to opt out of it, which I think is an immediate GDPR issue.

Given that you are logging metadata about the dataframes in use along with the user email and name of the logged in user, I can't see this ever being used in an environment where sensitive data is being processed, since it could potentially leak PII that's easily tied to a given company via the email address.

This is a great idea, and I think if you can go with the BSD license and provide a way for people to opt out of tracking (or ideally flip it and allow them to opt in) this could be used in any number of industries. As it stands currently I just don't think this will ever pass a data audit at any large company which is a real shame.

1 comments

Thanks for that callout. We appreciate the perspective and will work on adding more disclosures of where logging does occur, making it easier for the user to opt out, and review what we are logging and how we can reduce it. I think your analysis is correct, but as a summary: we log the email provided by the user when they first create a mitosheet, the size of the dataframe, the header names of the dataframe, and then the interactions with the UI.

For our current users who have told us that they are not comfortable with logging, we have been able to turn off logging for their specific accounts. So if you're interested in continuing to checkout the tool while we make those improvements, just let us know.