Hacker News new | ask | show | jobs
by lbrito 2346 days ago
So the solution is to use Amazon's tracker?

I appreciate the problem and I would like to stop using GA in my static pages as well, but trading one privately-owned software from a tech giant for another privately-owned software from a different tech giant seems a bit ludicrous. I would readily swap GA for some decent open-source solution though.

6 comments

I'd say the following to this (very reasonable) argument against using AWS: AWS makes money by selling services, not collecting data. Should Amazon make the leap and start harvesting data from AWS for marketing purposes, the data from their analytics platform will be the least of our worries.

Thus far, AWS has proven to be safe for companies to host their data upon, and there have been no leaks of data stored in AWS into Amazon's marketing program. The HIPPA, PCI, and FedRamp certifications help back up their claims that a company's AWS data stays in AWS.

People said the same about Apple, and then it turned out to be bullshit.[0] And, big surprise, their customers did not seem to hold them accountable.

So in the end, paying for stuff just shows that you're a more valuable product to sell. And gives them a great primary key to track you by.

[0]: https://news.ycombinator.com/item?id=22106536

I've said this to most naysayers, but AWS aims itself at businesses, who are much more forward with their pocketbook and lawyers in protecting their privacy. AWS can't safely market data stored in their systems without running the risk of being abandoned and sued into oblivion.

Even Google doesn't dare monetize data stored in their enterprise customer's databases.

EDIT: Failing to offer a privacy-enhancing feature, and actively compromising and mining your customers data are quite different scenarios.

Google doesn't dare directly monetize data stored in their customer's databases. There's zero doubt in my mind they are piping everything somewhere, completely anonymized, and feeding their models.
> their customers did not seem to hold them accountable.

That story is 48 hours old; it remains to be seen what the effects will be.

>People said the same about Apple, and then it turned out to be bullshit.

I don't understand this claim at all. It's always been clear that if your paranoid about the state, to turn off iCloud backups. And it's not like Apple is selling your backups either.

This is a tech oriented forum where we’re (as a group) more privacy conscious and more generally aware of how things are stored than the general public.

And yet even here, there is widespread surprise at the news that iCloud backups are not fully encrypted in such a way that keeps them private from Apple.

If we (as a group) have in some factions been caught by surprise, what are the chances that the general public are also not aware?

>And yet even here, there is widespread surprise at the news that iCloud backups are not fully encrypted in such a way that keeps them private from Apple.

Anyone who thought Apple kept iCloud backups fully encrypted was being willfully ignorant. Apple has been fully open to the fact that they share iCloud backups with the FBI, this exact same situation happened in the San Bernardino case where Apple provided the backups but the FBI cried about how they wouldn’t unlock the phone.

The tinfoil hat advice has always been to turn off iCloud backups. I don’t buy that anyone privacy conscious should have missed the fact that even Snowden was saying “use an iPhone but turn off iCloud backups” for the last several years.

If you got caught by surprise, then you weren’t paying attention. Apple wasn’t keeping it a secret that they would share your backups with law enforcement. The only reason this is possible is because today they don’t require you to enter a password to restore your iPhone to a new phone. Even the way the technology works today implies that Apple can read your iCloud backups without you knowing.

Or is that wishful thinking? AWS and Amazon are in the buisness of collecting, storing, and processing data.
Can you provide a citation about that with AWS and its customers? Amazon itself, I have no problem believing (and have seen it in action. However, AWS is run separately from Amazon('s storefront, specifically).

If AWS were really aggregating customer's data stored in AWS' platform, I think we'd be seeing a lot more about it in the news. And there would be a lot more than just Walmart advocating against its use.

Is that relevant? Nobody can provide such a citation for Google Cloud Platform either; the discussion is entirely around reputation of the parent company, not what the company claims it does with cloud services.
AWS mining their customer's data is a huge liability for their cloud business and a blatant violation of their service agreements. Something that would fundamentally break the concept of using cloud services like AWS or Azure.

This is why this comparison doesn't make sense. Google Analytics is a 3rd party service where you have no control at all of your data. You put a script of your app, and then you basically funnel data to them. That's it.

Using Pinpoint, in this case, is the equivalent of using EC2 and S3. You can control the flow and the lifecycle of data (deleting data forever, for example).

You should be able to trust that your customer data is safe there, otherwise, why use AWS at all? or better yet, why use any public cloud infrastructure provider at all?

If you're concerned about that, you probably should build your own self-managed server infrastructure.

There is one exception to this that I know of: AWS Rekognition will re-use the faces you scan with it for training and "to improve and develop the quality of Amazon Rekognition and other Amazon machine-learning/artificial-intelligence technologies." That's so vague, they can reuse it anywhere Amazon (not just AWS) uses machine learning.

You can opt out though.

First point under Data Privacy: https://aws.amazon.com/rekognition/faqs/

This is why I use Prime photos, instead of Google photos. I need one, so I’ll take the paid one.
I wouldn't conflate their general motives nor their data handling policies between their b2b offerings and their b2c offerings.
The idea that just because you're paying for something it means they're not sharing your data or won't sell your data in the future send a but naive. If you don't control the server you have no idea what they are doing. You're just hoping they don't sell your data.
Over the long term it's possible that maximizing shareholder value requires monetization of additional resources.

The approach of 'today, company X is known to make money via Y, so we can trust them with private data' only works until that data becomes valuable enough for the company to invest in extracting.

Monetizing data stored in AWS would destroy their credibility with corporations, who are much more proactive in protecting themselves (both legally and with their wallets) than consumers are.
Upvoted, and I don't disagree with your premise - we'll see how this plays out over the long term.
Well, AWS launched 13 years ago, and so far so good. It's no IBM, but it is a track record to consider.
I'm only aware of https://matomo.org/ (formerly Piwik) as a good open source alternative to Google Analytics. Are there others?
I can vouch for Countly ( https://github.com/countly ) which is open source, supports a few different platforms (web, iOS, Android), and has a nice administrative web interface.

The web SDK also supports collection of client-side JavaScript errors, which is neat for tracking down bugs and things which might harm user experience.

I'm a fan of Fathom[0]. The amount of data and insight is light compared to GA and others but if it meets your needs it's pretty great.

[0] https://github.com/usefathom/fathom

Do you know if fathom will remain opensource and/or self-hostable ? (https://github.com/usefathom/fathom/issues/268)
Nice answer from developer!:

> We are keeping this version open-source, forever, and committing to maintain it. We also have a business to run, and while we love open-source, it isn't paying our bills (and Fathom takes a lot of work from 2 people to keep going) and we're not a charity.

> If this repo was full of contributions and other folks pitching in, this would be a different story, but it's not—which is totally fine and accepted. But, since we want to keep going with Fathom, we have to separate V1 and V2 so we can make it sustainable. Otherwise we'd have to abandon it (which serves no one).

> If you truly want your complaint heard, maybe contribute to what you're complaining about (financially, time, effort, etc). My wife always tells me that I'm not allowed to gripe unless I'm also taking action.

Kind of a nice answer. Their comment about contributing is not really valid, I'm not going to contribute to something they have already solved and not released to open source. They already have the answer. And I understand the fact that open source is incredibly difficult to make money and maintaining it sucks. But just stop being open source and be another analytic company. Trying to pretend you are both and want contributions to things you've already written in your private repo doesn't work.
I think we read it differently. In my view this is as about as nice as one can be:

> We are keeping this version open-source, forever, ...

Here they are committing to keeping their existing Open Source around instead of taking their toys and go home.

> If this repo was full of contributions and other folks pitching in, this would be a different story, but it's not—which is totally fine and accepted.

I think they aren't asking for contributions to v2 but rather for contributions to v1 which they are committed to keeping Open Source. But even then they acknowledge that users are free to use it without contributing in any way.

The open-source (self-hosted) version of Fathom is dead. The only commits they have been pushing to it in the last 12 months have been upsells for their closed-source & centralized paid offering.
Agreed. They were super nice on their GitHub issues about removing cookies as a requirement (required on mobile) and then it was released to non open source. Don't use Fathom if you don't want to pay. I made that mistake for home projects.
> I would like to stop using GA in my static pages

Why don't you? Do you really need this tracking at all?

Depends on the job, purpose of site, and goals being accomplished. Hard to demonstrate value and improve digital business outcomes without GA (and other tools like HEAP Analytics).
I think it depends on what type of business you conduct...

Just like people are putting Alexa in their homes and businesses, it can be potentially used for anti-competitive reasons, or to get inside information. This will get worse over time as we know...

A simple example is hacking a CEO's Alexa to listen to his phone calls at home to get insider trading tips...

The companies that make these products are not bound by an official code of ethics, and Governments barely understand the implications of technology, much less than corruption of technology. Laws to prevent misuse and manipulation of consumer products are weak, but proper investigation and enforcement of those laws are even weaker.

Google has also been changing Chrome Browser to suit it's information gathering needs as well. If they own the majority of market share, they won't really need their analytic tools on every site. We need to start thinking on a larger scale about how companies can influence culture, markets, and lives, and how to ensure there are proper rules in place to prevent catastrophe.

Yeah, the "value" being shown rarely actually investigates causal impact. Else they'd recognize that the literature shows that digital advertising has no measurable (provable) impact.[0]

I recognize that there is more to digital business than adds, such as paternalistic commercial guidance, dark patterns, web traversal, and so on. However, I haven't seen proof that these patterns matter, especially given recent critiques on A/B testing (relative to multi-armed bandit).

[0] https://www.gwern.net/docs/traffic/2015-lewis.pdf

> Else they'd recognize that the literature shows that digital advertising has no measurable (provable) impact.[0]

This is a gross mischaracterization of what the linked paper says.

"It's extremely difficult to measure the impact" (the claim that the cited paper puts forward) has quite a different meaning from "no measurable impact." The paper is entirely about the difficulty of measuring the impact, and studiously avoids, as far as I can tell, any inference about what the impact actually was in these experiments. For example, Table I gives the mean of the control group sales and a standard deviation, but no mean of the treatment group sales, which you would need to do a statistical test of whether there was an impact; similarly, Table II reports standard deviations of the sales effect and ROI, but not means; Table III presents power calculations based on hypothetical ROIs and the real measured standard deviations, but gives no clue to the real ROIs were. Nowhere does the paper support your claim that there is no measurable impact.

In addition, the situation is very different for small companies that most people have never heard of. The article is citing studies done with major corporations with millions of customers and that are already well established in the collective consciousness. Measuring the effect of an advertising campaign taking you from 3.23 million customers to 3.231 million customers is indeed very difficult, particularly when you might fluctuate by tens of thousands of customers on a weekly basis. Measuring the effect of an AdWords campaign taking you from 200 customers to 250 customers is much easier.

And the freebie GA doesn't limit you to a 25% sample across the board basically the bigger the date range and amount of sessions the more sampling you get.

Id be interested to see how amazons et up would handle the set up I am backup analytics nerd for.

Major beverage brand hundreds of websites, multiple locales per site (a dozen or more for the big brands) on and has to handle roll up as well as well as custom metrics and dimensions.

In this case, AWS is contractually obliged not to use your data. My criticism of the article is that the author mentions that users are blocking access to third party trackers, but that means AWS Pinpoint set up in the way he suggests will get blocked as well. People who have that concern have to expose Pinpoint endpoints on their own domains or implement some other first party tracking solution.