Hacker News new | ask | show | jobs
by alkonaut 1155 days ago
I get that for some classes of "local-only" apps like compilers (famous from a recent discussion on the same topic), network communication can be surprising and therefore feel unnecessary. But for an app whose sole purpose is sending and receiving lots of sensitive private data to Dropbox servers, who has the energy to be outraged that there is also some other anonymous data sent such as program crash info?

I mean Dropbox has the contents of my files should I find it creepy or unnecessary that they know my RAM amount or what the last exception was?

3 comments

I am on the fence as to whether I agree with you, but I'll embellish a certain aspect: these folks keep citing how the number of DNS lookups (what pihole reports/blocks) is extraordinary, exceeding all other vendors they use, which says pretty much nothing about the nature of the payload other than that Dropbox likely has very granular client uptime data. The client could cache the DNS response instead of doing so many lookups, and some amount of rage would disappear.
For any local program, I can block its access to the network and still use it. For a program whose functionality requires internet access, I can either inspect every outgoing packet for exfiltrated data, or I can choose to trust some programs to be non-malicious based on their reputation. That trust is incredibly fragile, and unannounced spying/telemetry breaks it.
I'm wondering where the line is between acceptable and unacceptable logs. Obviously no one appreciates analytics used by marketing teams, but virtually every internet service has logs used by engineers (which seem to be what this post is about). A few factors that seem relevant:

- Is the service running locally?

- Do we trust/expect that the data is not used for marketing (i.e. would the user have complained if the domain was "error-reporting.dropbox.com")?

- Is the data anonymous (think twice, everyone who has IPs or user IDs in request logs)

- Did we agree to relevant ToS or privacy policies?

If we think carefully about this, I'd bet that most people here have used or even implemented some form of logging that has privacy problems.

It’s not that difficult to make anonymous (usually pseudonymous) usage stats. Of course you don’t store IPs, computer names, user names, emails, detailed geographical data etc. I think in the past this was a lot messier but these days with GDPR it’s quite easy to draw the line. Basically store nothing that is individual, nor enough data (entropy) associated with one pseudonymous user that they could be identified as individuals.