They'd be well advised to make this opt-in only for legal reasons. This is not going to go down well in a lot of places and they might get exposed to law suits.
Opt-in analytics are useless, unless a large part of your userbase just clicks through the entire wizard without thinking; there’s little overlap with Homebrew’s userbase
Homebrew was fine before they started collecting analytics. There are plenty of great package managers outside the macOS ecosystem that don't use analytics.
Yes. More evidence for an observation I've been having that ultimately the software is always about what the creator of the software wants, and not the user. I'm working on moving to running only my own software for that reason.
As in the post: they're intending to drop the GA part entirely within 90 days, and it sounds like the new metrics are entirely anonymous, and so not covered by GDPR etc. IANAL but as far as I can tell that should avoid all legal concerns once GA is gone.
Why does a package manager need to track their users at all?
If you want usage statistics for packages just track how often individual packages are downloaded on the server side. A maintainer has no need to know who's installing what.
> If you want usage statistics for packages just track how often individual packages are downloaded on the server side. A maintainer has no need to know who's installing what.
To be clear: Homebrew has no idea which users are installing what. We only store counters for package install, failure, etc. events, and everything that's stored is visible on the Homebrew website[1].
Homebrew's architecture doesn't really have a "server side" in the way your suggestion requires: the formulae and bottle components rely heavily on public services like GitHub Packages and GitHub Pages, which don't offer those kinds of analytics.
What kind of decisions do you make based on the analytics? Do you drop unpopular packages? To me, one of the advantages of a package manager is having a huge database of long-tail packages that are just one command away. If you kept only the popular packages you might as well just have an installer that installs them all together in bulk.
Yes: Homebrew deprecates and/or disables packages if we see evidence that they're unmaintained and not actually supported on the platforms we support, or only used by a tiny fraction of users while also requiring disproportionate maintainer time (e.g. due to complex or flaky builds).
The goal is to balance conflicting user interests: 99% of users want maintainer effort focused on the top 100 (or 500, or 1000) packages, and many of those packages also require significant maintainer effort (e.g. making sure that they don't cause transitive breakages).
> Why does a package manager need to track their users at all?
According to https://docs.brew.sh/Analytics they use it to measure how often formulas fail to install, to get overall metrics on which OS versions are used, and to correlate those (i.e. to tell on which OS versions specific packages fail to install correctly).
> A maintainer has no need to know who's installing what
Aside from the IP, they don't know who's installing what, and in the new model announced in this post they now don't store IPs or any other user token at all, so it should be purely anonymous aggregate metrics.
I agree that they don't need to know the "who", but it is perfectly understandable that they want to know "what" is being installed. And as part of the "what", they would want to know on which platform, and whether the install succeeded or failed, and probably a few other metrics about the install to ensure that things are working correctly and identify gaps that should be filled.
Based on what I read on the site, that looks like exactly what they are doing, and they are explicitly NOT storing information that would identify "who".
All stuff that should be in a trouble ticket from a whiney user. Which we know this type of user would be.
Edit-Also, this is for Mac OS. Chose a few standard OSes to support and test them. If a system update will fix the issue then it shouldn't be fixed at the package manager level.
Does anybody put a paywall in front of submitting tickets? Seems like homebrew could if anybody. Ie $10 to submit and might need more for a larger problem.
> Why does a package manager need to track their users at all?
Do any of you actually work in this industry shipping software products to end users? Without telemetry the problem there is literally one of trying to read the mind of your end users to figure out what they're doing, hoping that your internal CI manages to reflect the configuration in their environment.
I think HN has a very varied audience - some work in the industry, others want A/B testing to be made illegal on the grounds that it is non-consensual mind-control experimentation :P
I am in both groups. I work in the industry and I am so tired of colleagues wanting to grab or data they can get their grubby hands on and then barely use it at all for anything useful. So many companies collect data just in case.
Users report issues to GitHub? It's not like Brew users aren't sophisticated in that sense.
In addition to being INCREDIABLY slow, now I have to worry about what it might spy on. If I have a problem I'm more than happy to go to GitHub (or which ever site it's hosted on), and report it.
This industry has managed to ship software products without telemetry just fine - mass-collecting usage data from end users is only a relatively recent trend.
Homebrew is a package manager with thousands of packages, not a file server. We maintain those packages, and knowing when they break (or can be deprecated due to lack of use) is critical to the project's sustenance.
Which the software that I used to be employed maintaining has actually broken homebrew compiles when they've been installed at the same time (which I think I made better but I never got the PM who actually owned the product to spend the resources to properly fix).
A good example of how the configuration in the end user environment can affect package installation.
Maybe they want to include the most common packages in their unit tests, or understand usage patterns so they can prioritize development?
It’s very hard to write and maintain good software without knowing how it’s used. No package manager needs to know how you specifically use it, but aggregate data and the ability to identify scenarios it does not handle well are both very important for SW lifecycle.
It's 2023. Hard drive space shouldn't be an issue. Test installing the full software suite, make it work, and you know the lesser installs will all work.
Assuming their claims of anonymity are true, they won't be tracking users at all.
I imagine they can get much richer metrics through this as opposed to only tracking downloads on the server side.
I'm not saying I like it. In fact, I plan to keep it disabled. I'm just saying it's a bit naïve to think client-side analytics are the same as server-side download tracking.
Richer how, exactly? I fundamentally don't 'get' what richness they actually need.
If anything they need money, of course, and to know their software works for their users. Prior to release have a test system install the full base, test those packages work, and you know anything less will work too.
I haven't looked at what they're actually collecting, but here's a few things that come to mind:
- Time to install packages
- Versions of things
- Has the compilation (when required) failed? What dependency versions are installed?
- CPU architecture
- OS version
...
There's a lot more that can be sent from the client that's not available on the server side.
> I fundamentally don't 'get' what richness they actually need.
That's fine. Perhaps you could ask them instead of ranting about what you don't know or don't 'get' in a public forum?
> to know their software works for their users
Sounds like you're not very far from understanding why they want better telemetry.
> Prior to release (...) and you know anything less will work too.
Things break in unexpected ways. OSs are complex systems and there's a lot of interactions between components. Homebrew's user base is enormous and very diverse. There's 2 different architectures, many OS versions, lots of environment variables that might be set differently in each user's systems, different versions of libraries, ... I could go on but I think you get the picture.
Im okay with sending data to the server if things break and be asked each time to okay it. Is it safe to assume that if no errors or exceptions are encountered nothing should be sent back home?
The Homebrew folks think this is a non-problem. You may agree or disagree, but the pull request is certainly a non-solution to this maybe-problem. Not just because it gained zero traction and did not get merged anywhere, but also because it's just an obscure band-aid. Either opt-out anonymous telemetry is a good idea, or it's a problem. If to you it's a problem, advocate for its removal in its entirety.
So even if I have an issue with telemetry, good on the Homebrew maintainers for ignoring this MR.
The status quo is different variables and even different mechanisms for each program. Many badly documented. Only out of context is DO_NOT_TRACK obscure.
I think you know developers who reject informed consent will never adopt an informed consent model. The proposal was the best users could hope for realistically. Did you never compromise?
"Without telemetry, developers rely on bug reports and surveys to find out when their software isn’t working or how it is being used. Both of these techniques are too limited in their effectiveness."
Isn’t it GDPR compliant if you never store the source ip at all? So from a GDPR perspective there’s no user data to track and remove.
I’m not sure how organizations get audited to prove that they actually do that and that there’s no other way to reidentify users (eg, I download the prepend package every day and that’s unusual enough to link that it’s me, prepend, the author of that package, etc etc).
We only have their pinky promise that the new analytics are anonymous. For all we know this might be a PR operation because people increasingly dislike Google, and they'll sell the "anonymous" analytics to Google under the table.
I'll make it a goal to stop all their tracking on the level of my router.
> We only have their pinky promise that the new analytics are anonymous.
Isn’t Homebrew open source? One could audit the source themselves. If you’re talking about what happens on the server side yes, but I don’t see the difference with any other computer I connect with over the internet.
> Our self-hosted InfluxDB instance does not store either anonymised IP addresses or an anonymised user token so it has additional privacy benefits over Google Analytics.