Hacker News new | ask | show | jobs
by doctorpangloss 817 days ago
> As has been frequently pointed out, the benefits from improved profiling cannot be understated, even a 10% cost to having frame pointers can be well worth it when you leverage that information to target the actual bottlenecks that are eating up your cycles.

Few can leverage that information because the open source software you are talking about lacks telemetry in the self hosted case.

The profiling issue really comes down to the cultural opposition in these communities to collecting telemetry and opening it for anyone to see and use. The average user struggles to ally with a trustworthy actor who will share the information like profiling freely and anonymize it at a per-user level, the level that is actually useful. Such things exist, like the Linux hardware site, but only because they have not attracted the attention of agitators.

Basically users are okay with profiling, so long as it is quietly done by Amazon or Microsoft or Google, and not by the guy actually writing the code and giving it out for everyone to use for free. It’s one of the most moronic cultural trends, and blame can be put squarely on product growth grifters who equivocate telemetry with privacy violations; open source maintainers, who have enough responsibilities as is, besides educating their users; and Apple, who have made their essentially vaporous claims about privacy a central part of their brand.

Of course people know the answer to your question. Why doesn’t Google publish every profile of every piece of open source software? What exactly is sensitive about their workloads? Meta publishes a whole library about every single one of its customers, for anyone to freely read. I don’t buy into the holiness of the backend developer’s “cleverness” or whatever is deemed sensitive, and it’s so hypocritical.

3 comments

> Basically users are okay with profiling, so long as it is quietly done by Amazon or Microsoft or Google, and not by the guy actually writing the code and giving it out for everyone to use for free.

No; the groups are approximately "cares whether software respects the user, including privacy", or "doesn't know or doesn't care". I seriously doubt that any meaningful number of people are okay with companies invading their privacy but not smaller projects.

"Agitators". We don't trust telemetry precisely because of comments like that. World is full of people like you who apparently see absolutely nothing wrong with exfiltrating identifying information from other people's computers. We have to actively resist such attempts, they are constant, never ending and it only seems to get worse over time but you dismiss it all as "cultural opposition" to telemetry.

For the record I'm NOT OK with being profiled, measured or otherwise studied in any way without my explicit consent. That even extends to the unethical human experiments that corporations run on people and which they euphemistically call A/B tests. I don't care if it's Google or a hobbyist developer, I will block it if I can and I will not lose a second of sleep over it.

> World is full of people like you who apparently see absolutely nothing wrong with exfiltrating identifying information from other people's computers.

True. But such people are like cockroaches. They know what they are doing will be unpopular with their targets, so they keep it hidden. This is easy enough to do in closed designs, car manufacturers selling your driving habits to insurance companies and health monitoring app selling menstrual cycle data to retailers selling to women.

Compare that do say Debian and RedHat. They too collect performance data. But the code is open source, Debian has repeatable builds so you can be 100% sure that is the code in use, and every so often someone takes a look it. Guess what, the data they send back is so unidentifiable it satisfies even the most paranoid of their 1000's of members.

All it takes is a little bit of sunlight to keep the cockroaches at bay, and then we can safely let the devs collect the data they need to improve code. And everyone benefits.

I fully support the schemes for telemetry that already happens. It’s just obnoxious that there’s no apparent reason behind the support of one kind of anonymous telemetry versus another. For every 1 person who digests the package repo’s explanation of the telemetry strategy, there are 19 who feel okay with GitHub having all the telemetry and none of the open source repos they host because of vibes.
I think the kind of profiling information you're imagining is a little different from what I am.

Continuous profiling of your system that gets relayed to someone else by telemetry is very different from continuous profiling of your own system, handled only by yourself (or, generalising, your community/group/company). You seem to be imagining we're operating more in the former, whereas I am imagining more in the latter.

When it's our own system, better instrumented for our own uses, and we're the only ones getting the information, then there's nothing to worry about, and we can get much more meaningful and informative profiling done when more information about the system is available. I don't even need telemetry. When it's "someone else's" system, in other words, when we have no say in telemetry (or have to exercise a right to opt-out, rather than a more self-executing contract around opt-in), then we start to have exactly the kinds of issues you're envisaging.

When it's not completely out of our hands, then we need to recognise different users, different demands, different contexts. Catering to the user matters, and when it comes to sensitive information, well, people have different priorities and threat models.

If I'm opening a calendar on my phone, I don't expect it to be heavily instrumented and relaying all of that, I just want to see my calendar. When I open a calendar on my phone, and it is unreasonably slow, then I might want to submit relevant telemetry back in some capacity. Meanwhile, if I'm running the calendar server, I'm absolutely wanting to have all my instrumentation available and recording every morsel I reasonably can about that server, otherwise improving it or fixing it becomes much harder.

From the other side, if I'm running the server, I may want telemetry from users, but if it's not essential, then I can "make do" with only the occasional opt-in telemetry. I also have other means of profiling real usage, not just scooping it all up from unknowing users (or begrudging users). Those often have some other "cost", but in turn, they don't have the "cost" of demanding it from users. For people to freely choose requires acknowledging the asymmetries present, and that means we can't just take the path of least resistance, as we may have to pay for it later.

In short, it's a consent issue. Many violate that, knowingly, because they care not for the consequences. Many others don't even seem to think about it, and just go ahead regardless. And it's so much easier behind closed doors. Open source in comparison, even if not everything is public, must contend with the fact that the actions and consequences are (PRs, telemetry traffic, etc), so it inhabits a space in which violating consent is much more easily held accountable (though no guarantee).

Of course, this does not mean it's always done properly in open source. It's often an uphill battle to get telemetry that's off-by-default, where users explicitly consent via opt-in, as people see how that could easily be undermined, or later invalidated. Many opt-in mechanisms (e.g. a toggle in the settings menu) often do not have expiration built in, so fail to check at a later point that someone still consents. Not to say that's the way you must do it, just giving an example of a way that people seem to be more in favour of, as with the generally favourable response to such features making their way into "permissions" on mobile.

We can see how the suspicion creeps in, informed by experience... but that's also known by another word: vigilance.

So, users are not "okay" with it. There's a power imbalance where these companies are afforded the impunity because many are left to conclude they have no choice but to let them get away with it. That hasn't formed in a vacuum, and it's not so simple that we just pull back the curtain and reveal the wizard for what he is. Most seem to already know.

It's proven extremely difficult to push alternatives. One reason is that information is frequently not ready-to-hand for more typical users, but another is that said alternatives may not actually fulfil the needs of some users: notably, accessibility remains hugely inconsistent in open source, and is usually not funded on par with, say, projects that affect "backend" performance.

The result? Many people just give their grandma an iPhone. That's what's telling about the state of open source, and of the actual cultural trends that made it this way. The threat model is fraudsters and scammers, not nation-state actors or corporate malfeasance. This app has tons of profiling and privacy issues? So what? At least grandma can use it, and we can stay in contact, dealing with the very real cultural trends towards isolation. On a certain level, it's just pragmatic. They'd choose differently if they could, but they don't feel like they can, and they've got bigger worries.

Unless we do different, the status quo will remain. If there's any agitation to be had, it's in getting more people to care about improving things and then actually doing them, even if it's just taking small steps. There won't be a perfect solution that appears out of nowhere tomorrow, but we only have a low bar to clear. Besides, we've all thought "I could do better than that", so why not? Why not just aim for better?

Who knows, we might actually achieve it.