| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jillesvangurp 1217 days ago
	They'd be well advised to make this opt-in only for legal reasons. This is not going to go down well in a lot of places and they might get exposed to law suits.

3 comments

pancrufty 1217 days ago

Opt-in analytics are useless, unless a large part of your userbase just clicks through the entire wizard without thinking; there’s little overlap with Homebrew’s userbase

link

mola 1217 days ago

Ok, so don't do analytics.

link

throwaway5959 1217 days ago

Yes, hardware2win, have a worse product. Not everything needs to be phoning home all the time.

link

hardware2win 1217 days ago

Then have worse product for the users?

link

microtonal 1217 days ago

Homebrew was fine before they started collecting analytics. There are plenty of great package managers outside the macOS ecosystem that don't use analytics.

link

hardware2win 1217 days ago

So, just because you werent affected then no one was?

link

closewith 1217 days ago

Yes, you don't get to decide to violate your users' rights and surveil them because you think it will improve your product.

link

hardware2win 1217 days ago

1 it is their choice what soft they use, isnt it? Its not like chromium on android being pushed on you

2 there is "reasonable" / "good faith" data that in my opinion can be sent e.g crash log, stats like e.g package popularity etc.

You just create drama over nothing.

Ive used data like this to improve my soft countless times and there is nothing shady at all, everything is about what you collect.

Theres difference between keylogger or stealing nudes and tech data

link

drivers99 1217 days ago

> it is their choice what soft they use, isnt it

Yes. More evidence for an observation I've been having that ultimately the software is always about what the creator of the software wants, and not the user. I'm working on moving to running only my own software for that reason.

link

stalfosknight 1217 days ago

Not my problem. They don't have a right to spy on users.

link

sigzero 1217 days ago

Stop it. They are not spying on users.

link

stalfosknight 1216 days ago

Anytime "analytics" are opt-out instead of opt-in is spying in my book.

link

_ph_ 1217 days ago

"The alternative is useless" is not a valid legal defense.

link

pimterry 1217 days ago

As in the post: they're intending to drop the GA part entirely within 90 days, and it sounds like the new metrics are entirely anonymous, and so not covered by GDPR etc. IANAL but as far as I can tell that should avoid all legal concerns once GA is gone.

link

ianai 1217 days ago

Why does a package manager need to track their users at all?

If you want usage statistics for packages just track how often individual packages are downloaded on the server side. A maintainer has no need to know who's installing what.

link

woodruffw 1217 days ago

> If you want usage statistics for packages just track how often individual packages are downloaded on the server side. A maintainer has no need to know who's installing what.

To be clear: Homebrew has no idea which users are installing what. We only store counters for package install, failure, etc. events, and everything that's stored is visible on the Homebrew website[1].

Homebrew's architecture doesn't really have a "server side" in the way your suggestion requires: the formulae and bottle components rely heavily on public services like GitHub Packages and GitHub Pages, which don't offer those kinds of analytics.

FD: Member of Homebrew.

[1]: https://formulae.brew.sh/analytics/

link

chongli 1217 days ago

What kind of decisions do you make based on the analytics? Do you drop unpopular packages? To me, one of the advantages of a package manager is having a huge database of long-tail packages that are just one command away. If you kept only the popular packages you might as well just have an installer that installs them all together in bulk.

link

woodruffw 1217 days ago

> Do you drop unpopular packages?

Yes: Homebrew deprecates and/or disables packages if we see evidence that they're unmaintained and not actually supported on the platforms we support, or only used by a tiny fraction of users while also requiring disproportionate maintainer time (e.g. due to complex or flaky builds).

The goal is to balance conflicting user interests: 99% of users want maintainer effort focused on the top 100 (or 500, or 1000) packages, and many of those packages also require significant maintainer effort (e.g. making sure that they don't cause transitive breakages).

link

chongli 1217 days ago

So if a package has a small community of users and is well-maintained upstream but not popular overall you'll keep it around?

link

pimterry 1217 days ago

> Why does a package manager need to track their users at all?

According to https://docs.brew.sh/Analytics they use it to measure how often formulas fail to install, to get overall metrics on which OS versions are used, and to correlate those (i.e. to tell on which OS versions specific packages fail to install correctly).

> A maintainer has no need to know who's installing what

Aside from the IP, they don't know who's installing what, and in the new model announced in this post they now don't store IPs or any other user token at all, so it should be purely anonymous aggregate metrics.

link

kelthan 1217 days ago

I agree that they don't need to know the "who", but it is perfectly understandable that they want to know "what" is being installed. And as part of the "what", they would want to know on which platform, and whether the install succeeded or failed, and probably a few other metrics about the install to ensure that things are working correctly and identify gaps that should be filled.

Based on what I read on the site, that looks like exactly what they are doing, and they are explicitly NOT storing information that would identify "who".

link

woodruffw 1217 days ago

Correct: there's no identifiable information being stored, either before or with these changes.

link

dmix 1217 days ago

They can already gleen a lot of this since they run the hosted formula db anyway. A 90-day analytics capture isn't a big deal IMO.

link

Patrol8394 1217 days ago

Aside from the IP? IP nowadays is all you need…

link

hardware2win 1217 days ago

I think nowadays IP is less and less relevant when majority of people sit on dynamic IPs

link

lvh 1217 days ago

They've been quite clear about what they store and it's not IPs.

link

ianai 1217 days ago

All stuff that should be in a trouble ticket from a whiney user. Which we know this type of user would be.

Edit-Also, this is for Mac OS. Chose a few standard OSes to support and test them. If a system update will fix the issue then it shouldn't be fixed at the package manager level.

link

lamontcg 1217 days ago

Much rather deal with anonymized telemetry blowing up than tickets from whiney ass users.

link

ianai 1217 days ago

Does anybody put a paywall in front of submitting tickets? Seems like homebrew could if anybody. Ie $10 to submit and might need more for a larger problem.

link

lamontcg 1217 days ago

> Why does a package manager need to track their users at all?

Do any of you actually work in this industry shipping software products to end users? Without telemetry the problem there is literally one of trying to read the mind of your end users to figure out what they're doing, hoping that your internal CI manages to reflect the configuration in their environment.

link

Shish2k 1217 days ago

I think HN has a very varied audience - some work in the industry, others want A/B testing to be made illegal on the grounds that it is non-consensual mind-control experimentation :P

link

bathtub365 1217 days ago

The groups of people who work in the industry and those who believe A/B testing is psychological experimentation aren’t disjoint.

link

jeltz 1217 days ago

I am in both groups. I work in the industry and I am so tired of colleagues wanting to grab or data they can get their grubby hands on and then barely use it at all for anything useful. So many companies collect data just in case.

link

hungryforcodes 1217 days ago

Users report issues to GitHub? It's not like Brew users aren't sophisticated in that sense.

In addition to being INCREDIABLY slow, now I have to worry about what it might spy on. If I have a problem I'm more than happy to go to GitHub (or which ever site it's hosted on), and report it.

link

closewith 1217 days ago

I imagine many of us work shipping software to end users and also respect their right to privacy, and only track their actions with informed consent.

link

account42 1217 days ago

This industry has managed to ship software products without telemetry just fine - mass-collecting usage data from end users is only a relatively recent trend.

link

pdimitar 1217 days ago

Any actual arguments?

I don't see why something that's little more than a file server needs telemetry.

link

woodruffw 1217 days ago

Homebrew is a package manager with thousands of packages, not a file server. We maintain those packages, and knowing when they break (or can be deprecated due to lack of use) is critical to the project's sustenance.

link

pdimitar 1217 days ago

Okay, fair enough. But the breakage can't be detected without telemetry then, I take it?

If so, that's... not ideal for sure.

link

lamontcg 1217 days ago

And with packages that compile.

Which the software that I used to be employed maintaining has actually broken homebrew compiles when they've been installed at the same time (which I think I made better but I never got the PM who actually owned the product to spend the resources to properly fix).

A good example of how the configuration in the end user environment can affect package installation.

link

therealdrag0 1217 days ago

Have you looked at the analytics yet? Or are you only speaking from ideological priors?

The most valuable one I’d guess is package install error rates. Seems pretty useful to me.

link

shepherdjerred 1217 days ago

> something that's little more than a file server

You're doing an awful disservice to Homebrew.

link

pdimitar 1217 days ago

I am, yes, and sorry about it.

I don't like telemetry at all and I believe we have to find other ways to do QA. Hence my strong reaction.

link

brookst 1217 days ago

Maybe they want to include the most common packages in their unit tests, or understand usage patterns so they can prioritize development?

It’s very hard to write and maintain good software without knowing how it’s used. No package manager needs to know how you specifically use it, but aggregate data and the ability to identify scenarios it does not handle well are both very important for SW lifecycle.

link

ianai 1217 days ago

It's 2023. Hard drive space shouldn't be an issue. Test installing the full software suite, make it work, and you know the lesser installs will all work.

link

orf 1217 days ago

Do you think “hard drive space” is the constraining factor when building and testing over 6.5k third party packages?

Do you really not see any advantage to maintainers having visibility into what packages people actually use?

link

Larrikin 1217 days ago

How much would you be willing to pay so that Homebrew can maintain a large amount of hardware covering nearly all configurations?

link

justinclift 1217 days ago

Just checked their OpenCollective, and they seem to have about US$100k there:

https://opencollective.com/homebrew#category-BUDGET

They seem to be receiving about US$2k/month via Patreon too:

https://www.patreon.com/homebrew

I think their Patreon was around the same when I looked ~12 months ago.

link

shepherdjerred 1217 days ago

> Test installing the full software suite, make it work

Are you paying for the compute?

link

cassianoleal 1217 days ago

Assuming their claims of anonymity are true, they won't be tracking users at all.

I imagine they can get much richer metrics through this as opposed to only tracking downloads on the server side.

I'm not saying I like it. In fact, I plan to keep it disabled. I'm just saying it's a bit naïve to think client-side analytics are the same as server-side download tracking.

link

ianai 1217 days ago

Richer how, exactly? I fundamentally don't 'get' what richness they actually need.

If anything they need money, of course, and to know their software works for their users. Prior to release have a test system install the full base, test those packages work, and you know anything less will work too.

link

cassianoleal 1217 days ago

I haven't looked at what they're actually collecting, but here's a few things that come to mind:

- Time to install packages - Versions of things - Has the compilation (when required) failed? What dependency versions are installed? - CPU architecture - OS version ...

There's a lot more that can be sent from the client that's not available on the server side.

> I fundamentally don't 'get' what richness they actually need.

That's fine. Perhaps you could ask them instead of ranting about what you don't know or don't 'get' in a public forum?

> to know their software works for their users

Sounds like you're not very far from understanding why they want better telemetry.

> Prior to release (...) and you know anything less will work too.

Things break in unexpected ways. OSs are complex systems and there's a lot of interactions between components. Homebrew's user base is enormous and very diverse. There's 2 different architectures, many OS versions, lots of environment variables that might be set differently in each user's systems, different versions of libraries, ... I could go on but I think you get the picture.

Edit: s/collected/collecting/

link

grugagag 1217 days ago

Im okay with sending data to the server if things break and be asked each time to okay it. Is it safe to assume that if no errors or exceptions are encountered nothing should be sent back home?

link

mo_42 1217 days ago

This discussion on GitHub reveals the mindset of the Homebrew people: https://github.com/Homebrew/brew/pull/6745

link

kspacewalk2 1217 days ago

The Homebrew folks think this is a non-problem. You may agree or disagree, but the pull request is certainly a non-solution to this maybe-problem. Not just because it gained zero traction and did not get merged anywhere, but also because it's just an obscure band-aid. Either opt-out anonymous telemetry is a good idea, or it's a problem. If to you it's a problem, advocate for its removal in its entirety.

So even if I have an issue with telemetry, good on the Homebrew maintainers for ignoring this MR.

link

pseudalopex 1217 days ago

The status quo is different variables and even different mechanisms for each program. Many badly documented. Only out of context is DO_NOT_TRACK obscure.

I think you know developers who reject informed consent will never adopt an informed consent model. The proposal was the best users could hope for realistically. Did you never compromise?

link

cratermoon 1217 days ago

"Without telemetry, developers rely on bug reports and surveys to find out when their software isn’t working or how it is being used. Both of these techniques are too limited in their effectiveness."

https://research.swtch.com/telemetry-intro

link

capitol_ 1217 days ago

It's hard to send stuff over the Internet without exposing some personal information, like your ip number.

I guess they might send it over TOR to get around that.

link

prepend 1217 days ago

Isn’t it GDPR compliant if you never store the source ip at all? So from a GDPR perspective there’s no user data to track and remove.

I’m not sure how organizations get audited to prove that they actually do that and that there’s no other way to reidentify users (eg, I download the prepend package every day and that’s unusual enough to link that it’s me, prepend, the author of that package, etc etc).

link

mnot 1217 days ago

This is exactly the use case that Oblivious HTTP is being built for in he IETF.

link

pdimitar 1217 days ago

We only have their pinky promise that the new analytics are anonymous. For all we know this might be a PR operation because people increasingly dislike Google, and they'll sell the "anonymous" analytics to Google under the table.

I'll make it a goal to stop all their tracking on the level of my router.

link

derstander 1217 days ago

> We only have their pinky promise that the new analytics are anonymous.

Isn’t Homebrew open source? One could audit the source themselves. If you’re talking about what happens on the server side yes, but I don’t see the difference with any other computer I connect with over the internet.

[1] https://github.com/Homebrew/brew

link

woodruffw 1217 days ago

Homebrew is not a custodian of any personally identifiable data.

link

pdntspa 1217 days ago

Does that include IP addresses? Because I think that is considered PII

link

woodruffw 1217 days ago

Homebrew does not store IP addresses, so yes.

You can see the totality of the information stored on the Homebrew website[1].

[1]: https://formulae.brew.sh/analytics/

link

eddieroger 1217 days ago

From the post:

> Our self-hosted InfluxDB instance does not store either anonymised IP addresses or an anonymised user token so it has additional privacy benefits over Google Analytics.

link