I appreciate the problem and I would like to stop using GA in my static pages as well, but trading one privately-owned software from a tech giant for another privately-owned software from a different tech giant seems a bit ludicrous. I would readily swap GA for some decent open-source solution though.
I'd say the following to this (very reasonable) argument against using AWS: AWS makes money by selling services, not collecting data. Should Amazon make the leap and start harvesting data from AWS for marketing purposes, the data from their analytics platform will be the least of our worries.
Thus far, AWS has proven to be safe for companies to host their data upon, and there have been no leaks of data stored in AWS into Amazon's marketing program. The HIPPA, PCI, and FedRamp certifications help back up their claims that a company's AWS data stays in AWS.
I've said this to most naysayers, but AWS aims itself at businesses, who are much more forward with their pocketbook and lawyers in protecting their privacy. AWS can't safely market data stored in their systems without running the risk of being abandoned and sued into oblivion.
Even Google doesn't dare monetize data stored in their enterprise customer's databases.
EDIT: Failing to offer a privacy-enhancing feature, and actively compromising and mining your customers data are quite different scenarios.
Google doesn't dare directly monetize data stored in their customer's databases. There's zero doubt in my mind they are piping everything somewhere, completely anonymized, and feeding their models.
>People said the same about Apple, and then it turned out to be bullshit.
I don't understand this claim at all. It's always been clear that if your paranoid about the state, to turn off iCloud backups. And it's not like Apple is selling your backups either.
This is a tech oriented forum where we’re (as a group) more privacy conscious and more generally aware of how things are stored than the general public.
And yet even here, there is widespread surprise at the news that iCloud backups are not fully encrypted in such a way that keeps them private from Apple.
If we (as a group) have in some factions been caught by surprise, what are the chances that the general public are also not aware?
>And yet even here, there is widespread surprise at the news that iCloud backups are not fully encrypted in such a way that keeps them private from Apple.
Anyone who thought Apple kept iCloud backups fully encrypted was being willfully ignorant. Apple has been fully open to the fact that they share iCloud backups with the FBI, this exact same situation happened in the San Bernardino case where Apple provided the backups but the FBI cried about how they wouldn’t unlock the phone.
The tinfoil hat advice has always been to turn off iCloud backups. I don’t buy that anyone privacy conscious should have missed the fact that even Snowden was saying “use an iPhone but turn off iCloud backups” for the last several years.
If you got caught by surprise, then you weren’t paying attention. Apple wasn’t keeping it a secret that they would share your backups with law enforcement. The only reason this is possible is because today they don’t require you to enter a password to restore your iPhone to a new phone. Even the way the technology works today implies that Apple can read your iCloud backups without you knowing.
Can you provide a citation about that with AWS and its customers? Amazon itself, I have no problem believing (and have seen it in action. However, AWS is run separately from Amazon('s storefront, specifically).
If AWS were really aggregating customer's data stored in AWS' platform, I think we'd be seeing a lot more about it in the news. And there would be a lot more than just Walmart advocating against its use.
Is that relevant? Nobody can provide such a citation for Google Cloud Platform either; the discussion is entirely around reputation of the parent company, not what the company claims it does with cloud services.
AWS mining their customer's data is a huge liability for their cloud business and a blatant violation of their service agreements. Something that would fundamentally break the concept of using cloud services like AWS or Azure.
This is why this comparison doesn't make sense. Google Analytics is a 3rd party service where you have no control at all of your data. You put a script of your app, and then you basically funnel data to them. That's it.
Using Pinpoint, in this case, is the equivalent of using EC2 and S3. You can control the flow and the lifecycle of data (deleting data forever, for example).
You should be able to trust that your customer data is safe there, otherwise, why use AWS at all? or better yet, why use any public cloud infrastructure provider at all?
If you're concerned about that, you probably should build your own self-managed server infrastructure.
There is one exception to this that I know of: AWS Rekognition will re-use the faces you scan with it for training and "to improve and develop the quality of Amazon Rekognition and other Amazon machine-learning/artificial-intelligence technologies." That's so vague, they can reuse it anywhere Amazon (not just AWS) uses machine learning.
The idea that just because you're paying for something it means they're not sharing your data or won't sell your data in the future send a but naive. If you don't control the server you have no idea what they are doing. You're just hoping they don't sell your data.
Over the long term it's possible that maximizing shareholder value requires monetization of additional resources.
The approach of 'today, company X is known to make money via Y, so we can trust them with private data' only works until that data becomes valuable enough for the company to invest in extracting.
Monetizing data stored in AWS would destroy their credibility with corporations, who are much more proactive in protecting themselves (both legally and with their wallets) than consumers are.
I can vouch for Countly ( https://github.com/countly ) which is open source, supports a few different platforms (web, iOS, Android), and has a nice administrative web interface.
The web SDK also supports collection of client-side JavaScript errors, which is neat for tracking down bugs and things which might harm user experience.
> We are keeping this version open-source, forever, and committing to maintain it. We also have a business to run, and while we love open-source, it isn't paying our bills (and Fathom takes a lot of work from 2 people to keep going) and we're not a charity.
> If this repo was full of contributions and other folks pitching in, this would be a different story, but it's not—which is totally fine and accepted. But, since we want to keep going with Fathom, we have to separate V1 and V2 so we can make it sustainable. Otherwise we'd have to abandon it (which serves no one).
> If you truly want your complaint heard, maybe contribute to what you're complaining about (financially, time, effort, etc). My wife always tells me that I'm not allowed to gripe unless I'm also taking action.
Kind of a nice answer. Their comment about contributing is not really valid, I'm not going to contribute to something they have already solved and not released to open source. They already have the answer. And I understand the fact that open source is incredibly difficult to make money and maintaining it sucks. But just stop being open source and be another analytic company. Trying to pretend you are both and want contributions to things you've already written in your private repo doesn't work.
The open-source (self-hosted) version of Fathom is dead. The only commits they have been pushing to it in the last 12 months have been upsells for their closed-source & centralized paid offering.
Agreed. They were super nice on their GitHub issues about removing cookies as a requirement (required on mobile) and then it was released to non open source. Don't use Fathom if you don't want to pay. I made that mistake for home projects.
Depends on the job, purpose of site, and goals being accomplished. Hard to demonstrate value and improve digital business outcomes without GA (and other tools like HEAP Analytics).
I think it depends on what type of business you conduct...
Just like people are putting Alexa in their homes and businesses, it can be potentially used for anti-competitive reasons, or to get inside information. This will get worse over time as we know...
A simple example is hacking a CEO's Alexa to listen to his phone calls at home to get insider trading tips...
The companies that make these products are not bound by an official code of ethics, and Governments barely understand the implications of technology, much less than corruption of technology. Laws to prevent misuse and manipulation of consumer products are weak, but proper investigation and enforcement of those laws are even weaker.
Google has also been changing Chrome Browser to suit it's information gathering needs as well. If they own the majority of market share, they won't really need their analytic tools on every site. We need to start thinking on a larger scale about how companies can influence culture, markets, and lives, and how to ensure there are proper rules in place to prevent catastrophe.
Yeah, the "value" being shown rarely actually investigates causal impact. Else they'd recognize that the literature shows that digital advertising has no measurable (provable) impact.[0]
I recognize that there is more to digital business than adds, such as paternalistic commercial guidance, dark patterns, web traversal, and so on. However, I haven't seen proof that these patterns matter, especially given recent critiques on A/B testing (relative to multi-armed bandit).
> Else they'd recognize that the literature shows that digital advertising has no measurable (provable) impact.[0]
This is a gross mischaracterization of what the linked paper says.
"It's extremely difficult to measure the impact" (the claim that the cited paper puts forward) has quite a different meaning from "no measurable impact." The paper is entirely about the difficulty of measuring the impact, and studiously avoids, as far as I can tell, any inference about what the impact actually was in these experiments. For example, Table I gives the mean of the control group sales and a standard deviation, but no mean of the treatment group sales, which you would need to do a statistical test of whether there was an impact; similarly, Table II reports standard deviations of the sales effect and ROI, but not means; Table III presents power calculations based on hypothetical ROIs and the real measured standard deviations, but gives no clue to the real ROIs were. Nowhere does the paper support your claim that there is no measurable impact.
In addition, the situation is very different for small companies that most people have never heard of. The article is citing studies done with major corporations with millions of customers and that are already well established in the collective consciousness. Measuring the effect of an advertising campaign taking you from 3.23 million customers to 3.231 million customers is indeed very difficult, particularly when you might fluctuate by tens of thousands of customers on a weekly basis. Measuring the effect of an AdWords campaign taking you from 200 customers to 250 customers is much easier.
And the freebie GA doesn't limit you to a 25% sample across the board basically the bigger the date range and amount of sessions the more sampling you get.
Id be interested to see how amazons et up would handle the set up I am backup analytics nerd for.
Major beverage brand hundreds of websites, multiple locales per site (a dozen or more for the big brands) on and has to handle roll up as well as well as custom metrics and dimensions.
In this case, AWS is contractually obliged not to use your data. My criticism of the article is that the author mentions that users are blocking access to third party trackers, but that means AWS Pinpoint set up in the way he suggests will get blocked as well. People who have that concern have to expose Pinpoint endpoints on their own domains or implement some other first party tracking solution.
I see a lot of suggestions for free or open-source analytics packages, but I would refrain from recommending anything you haven't personally used.
I've tried to separate myself from Google in various ways, and one of those was to replace Google Analytics with open source software. I tried several; they're all either non-functional out of the box, or require significant time investment to even start approaching Google Analytics.
After losing about a month of stats (which matters when you're also running AdSense), I ended up going back to Google. It took the same amount of time to set up as when I initially set it up: around 2 minutes of adding the tracking code and uploading it.
I'm building an analytics service and thank you for the feedback! I'm currently building a service that's nearly as fast as Google Analytics and simple as can be (although there's going to be a tone of new features soon).
So... I should install a script that loads from Google servers?
Apart from that, you probably don't want to tie yourself to google like that. Once the users have this in their pages they will _never_ update it. You should use your own domain.
Yes. You're absolutely right.
I'm just hacking stuff together right now (and recently moved every single piece of infra onto a local server from GCP). I'm still in that migration process.
> I tried several; they're all either non-functional out of the box, or require significant time investment to even start approaching Google Analytics.
It's almost as if you need to be a software engineer and do actual software engineering, to responsibly use tools like analytics.
> I ended up going back to Google. It took the same amount of time to set up as when I initially set it up: around 2 minutes of adding the tracking code and uploading it.
So how much effort is the privacy of your visitors worth, then?
It sounds like deep down you know the right thing to do, which is a lot of work, but seeing everybody (in your bubble of technical peers) just as easily use Google Analytics, makes you feel like you're owed the difference to these profits.
Maybe there shouldn't be a 2-minute turnkey solution to analytics, because even if it's self-hosted, your next excuse is going to be that it requires a significant time effort to keep it secure and act responsibly with the data.
> require significant time investment to even start approaching Google Analytics.
I think that for a lot of the "alternative" analytics tools, feature parity with Google Analytics isn't necessarily a goal, so this may explain your disappointment. I think the only exception here is Matomo, which is the only "advanced" OSS analytics as far as I know.
That's swell and all, but you are going to find businesses still going to Google Analytics because it is easy to setup and free. The cost to consumers by having their data shared everywhere isn't even thought on the horizon.
I'm not ever sure how that comment even relates to mine. I just explained the goals of various OSS project as I understand them. I never said anything about setup costs or price.
Please also bear in mind that any developer's personal time (whether your own or someone else's), while important and valuable - is also a trade-off against the aggregated time and value of your users' privacy.
... which, if users actually valued, they'd signal by ceasing to use one's site.
There is basically no strong indication right now of any large segment of users boycotting sites because the users care about privacy. There's the same small amount that have always been present and the number doesn't appear to be growing.
It seems to me like a person could care about “these n people lacking privacy in this way” more than n times as much as they would care about any one of them marginally gaining or losing privacy in that way?
Or, idk if that is quite the right formulation for what I mean.
But, at the least, it seems likely that some people will sometimes be willing to take an amount of effort to protect the privacy of a large number of people, when they wouldn’t take that same amount of effort to just protect their own to the same degree.
It seems likely to me that a major impact of lack of privacy comes from many people lacking privacy, in ways that wouldn’t happen if it were only a few lacking it, and also where a few re-gaining it doesn’t influence the impact all that much.
If so, then people not avoiding something because of privacy concerns in sufficient numbers to substantially influence the amount of use, doesn’t seem to entirely rule out that they care about privacy. Perhaps their behavior could be attributed to a collective action problem, where they each would prefer that all of them avoid it, but don’t find it worthwhile to be among only a small number of people avoiding it.
The headline is wrong, it should be changed to "Stop donating your customers' data to Google Analytics ... donate to another large corporation instead!"
There are much better options out there. Quite apart from the solutions listed in these comments, a better option is to reconsider whether you really need analytics at all. Maybe the answer is yes if you are a business trying to understand your customers. But not every blog and project page needs analytics.
I'm ashamed to admit that I use GA on my blog to essentially count page views. The other information is interesting but mostly unused (by me). I would be far better served by a tool or service handling server logs (any recommendations?). But GA is 0 friction, so it's what I picked up back in the day. I suspect there are a lot of people in this boat.
Even for page views, 10 lines of code won't replicate GA. Try counting how many hits, and you will find that all the bots and spiders quickly make the numbers meaningless.
Of course, if that is all you are doing, you should be using Matamo or Fathom or whatever, but it is not fair to say GA could easily be replaced.
Many of the common web log analyzers are a bit long in the tooth.
I've have used GoAccess for a while now and is mostly happy with it. It's fast enough and can generate pretty good looking static html which is mostly what you want for those simple use cases.
A side effect of processing log files is that you can freely try software on historical data.
Do you have a recommendation on log format for GoAccess? I run a lot of custom services with no nginx etc in front, so I'll have to figure out the logging myself.
I reckon you'd need more than 9 "\n" characters to get it done.
But in seriousness the 10 lines would be just use local storage or wotnot to store a tag, then call tracker.com?tag=... on each page load.
"Rest is done on the server (TM)"
I don't disagree, but in a bit of fairness, a lot of people just use GA for page-counts and basic correlation stuff...you could do that in a relatively small amount of frontend JS stuff and a slightly-more-complex backend API to handle the basic correlation stuff.
That said, that will only do about .01% of all the features of GA; like the infamous "FTP vs. Dropbox" the premise itself isn't exactly "wrong", just missing a bigger point.
Could you link me to the FTP vs. Dropbox discussion? I am curious. I haven't used either for years and the implication seems to be that there is a profound difference, so I wonder what it is.
fathom - Looks great. I am OK with closed source products (my motivation is self-hosting/privacy) but the direction is not clear to me. Maybe they will have a blog about it at some point - https://github.com/usefathom/fathom/issues/268. Having multiple code bases is going to be super hard.
goaccess.io - this analyses web logs
google-analytics-proxy - project is dead
matomo - this is what i use now and it works great. has a lot of quirks but if you spend some time, you can make it work.
ackee, goatcounter - simple but looks like this does not track users/sessions. it's mostly for page hits.
countly - looks good if you are enterprise. there is no pricing :(
freshlytics (from another thread) - page says it's in beta and not production ready
GoatCounter author here: doing some form of session tracking is on the roadmap; check back in a few months. The project is still quite new, with the first "real" release only being last week :-)
As for Fathom, I find that last "since that people are confused"-comment rather funny, since their messaging on this has been confused for almost a year, haha
Or just use server log analytics. Client-side analytics are a significant contributor to the proliferation of JavaScript bloat and unnecessary 3rd party cookies.
Depending on which solution you use; some of them are just a few KB, which is not so much.
Doing log-analysis has its own drawbacks: not everyone has access to them, bot traffic will be a lot higher, and certain information is hard to access (like screen size). You can't always "just" use it.
I've been surprised lately by how much more pleasant, readable and usable the news/blog-web is with JavaScript turned off. JavaScript is basically just used for the user-hostile ad-tech.
I've been thinking the same thing, and started[0] experimenting[1] with some ideas. I think it be fun to make a web browser that implements a few HTML tags, flexbox and a few other CSS primitives (no animations), and no JS. Sites that are compatible with it would still work on current browsers.
It's why I'm somewhat against WASM, even though it's very cool from a technical standpoint. It makes the web even more of an operating system, where I'd like it to be less.
Agreed if you're already self-hosting. However, I don't know of a JS-free solution if you're hosting on a 3rd party like GitHub pages or Netlify for example.
(Netlify does sell access to log data but it looks expensive for most hobby / personal sites)
You can probably use the "tracking pixel" method with at least some analytics tools. This is a very old which probably predates even the invention of JavaScript.
Basically, if it accepts a GET with query parameters, it should work.
You can do it with one of the hosted services. I don't know which ones support it exactly, but GoatCounter does (although it's kind of an undocumented feature until I merge PR #122).
With our open-source data collection framework like RudderStack (an alternative to Segment), dumping data into a warehouse (Redshift/BigQuery/Druid etc) and sticking another open-source visualization layer on top (e.g. like SuperSet), it is possible to put together an alternative to Google Analytics. One of our early users did it and we wrote a blog about it
My Firefox was a minor update earlier than the one on the sibling comment (72.0.1). It has updated now, and the site claims that my connection is down on my machine too.
Now that I have seen the message... It's a funny thing for a web page to claim.
The problem is usually competing with "free" and Google knows this, there are privacy respecting alternatives like https://www.visitor-analytics.io though.
> Tracker blockers are increasing in popularity so consumers can protect themselves against this tracking, reducing the effectiveness of your analytics.
More to the point: there is probably going to be a bias in the analytics. Different people have different reasons for protecting themselves against tracking, but it is highly unlikely that people who are unaware of or disinterested in the issue will use a blocker.
You seems to have missed the arguments made tho.
You get to avoid the cookies and as someone else pointed out Amazon doesn't use the data. It's your data.
Did not read through, but from a quick look, I suspect anyone can grab the code, and fill in your AWS with terabytes of garbage data which will end up in an enromous amount of dollars in AWS billing.
Interesting topic. This among others is one of the reasons we started building Harvest. Just as with Google Analytics, you can start tracking data with just a small snippet of Javascript.
We use Splunk as our data engine and you can install it on your own server. This way you have full control, access and ownership of your data without letting third parties get any data. In that sense Harvest is basically the infrastructure that allows you to collect, store, use and visualize your data.
Besides that, we have been focusing on features that will help companies comply with privacy regulations. It is proven that this is not always easy in the complex world of online data.
The suggested Google Analytics implementation today is a collection of three separate Google technologies: the original GA, Doubleclick cookies to track demographics and interest, and Tag Manager to manage them.
The original GA does not give Google useful cross-site user data because it uses only first-party cookies and anonymizes data as it collected it. To my knowledge you can still implement GA this way If you want to. Such an implementation would be GDPR compliant in not tracking any personal data, although your counsel might still say you need to list them as “analytics” cookies in a cookie banner (mine did).
No, they don't anonymize the collected data (for any reasonable definition of "anonymous". The IP address alone gives GA a very close approximation of a unique key, and their own documentation[1] explains the "anonymization" process:
"... the last octet of the user IP address
is set to zero ..."
(if the logged event doesn't opt-in to this behavior by adding &aip=1 then GA presumably saves the entire IP. How many GA users bother setting that option?)
The 8 least significant hits of an IPv4 address are the least interesting. The remaining 24 bits gives GA the ASN and is a lot of entropy for fingerprinting. It would be trivial to recover a unique key from the "anonymized" address by combining it with other analytics data, other cookies, timestamps.
Yes, you can configure Google Analytics so that no data is shared with other Google services, at least no data about single visitors. I also came to the conclusion that using GA this way complies with the GDPR and I don't really understand what all the fuzz is about.
As someone going through this right now, the main difficulty in being GDPR compliant with GA is the cookie problem.
You can either disable cookies to run GA in cookieless mode [1], which presumably will affect how GA performs, since they can't determine repeat visits (but this might be fine, depending on the type of site you have), or you need to gain active consent to enable analytics cookies [2], which isn't much good if you want metrics for all users, not just those that opt-ed in.
If someone has solved this reasonably, I'd love to hear how! For now it seems like cookieless is my only option.
> Such an implementation would be GDPR compliant in not tracking any personal data, although your counsel might still say you need to list them as “analytics” cookies in a cookie banner (mine did).
Your council should also have advised you that you need active consent in your cookie banner, since GDPR raised the standard for consent, which is the stumbling block I'm facing. [1]
I didn't know about AWS Pinpoint before, but from what I can see, it only offers analytics for email and other messages, not for web pages, so presenting it as a full alternative for Google Analytics is misleading.
The article doesn't even seem to mention anything for which GA is nearly indispensable. E-commerce analytics, conversion funnel visualization, customer segmentation, etc.
While running my first business GA was not really usefull (We used internal tools easier to integrate in the code and adapt to our needs).
However GA data showed its usefullness when selling the business. The data was considered as a trusted source of information for the buyer.
And all the definitions (unique user, etc) were aligned with the buyer's, so it was easier for them to assess the metrics.
I have created a simple workflow using AWS Lambda + Kinesis + S3 to track our customers and not to have any 3rd party dependency. It took roughly 2 weeks but it is worth it since do not leak customer data and we have much tighter control over what we collect (no PII except the source ip that gets hashed in the process).
FYI, if your setup relies on API Gateway, you probably could use VTL / Mapping Templates to directly send from API Gateway to Kinesis and skip the lambda altogether, like some do for dynamodb
Everything is a 3rd party dependency then. The only way to not have a 3rd party dependency is to build your own infrastructure and use open source solutions (and even with OS you're still dependent).
I think OP was clearly referring to a self-managed solution as opposed to a set of 3rd party services like GA, Segment, etc, where the flow of data is out for your control.
I meant no 3rd party dependency on storing customer data that requires extra legal work in GDPR land. Maybe we need to include AWS in that though. I need to look into it how cloud vendors are 3rd party in that sense. Is there a difference between Google Analytics vs. storing data on S3 even if we do not collect PII?
Why not spin your own? This tool comes with a lot of tools out of the box and can also run personalization techniques and more: https://harvest.graindata.com/en/store
a few weeks ago my blood needed a checkup.
They sent me the results by mail.
The results where on a non password protected but 'unguessable' url. And the page ofcourse contained google analytics, I'm in the EU, I wonder if this is legal
If they haven’t notified you, and hence you can’t/didn’t comply, it probably isn’t. Especially medical companies are scrutinized for following gdpr. You could make a case here to either the companies privacy officer, or your countries privacy watchdog.
Don’t you think that, in today’s climate, customer knowledge and involvement about online tracking, fingerprinting, filter bubbles etc. is at an all time high? And that makes this the best time ever to indeed be proud to tell your customers that “we value your privacy”.
I'm currently building a free analytics service that's the fastest. Ever. Faster than Fathom, Simple Analaytics, pretty much everything except Google Analytics you can think of.
Still building it, but you can sign up for when it launches here: https://forms.gle/MhojBWWfdiWjZatC7 (I know it's ironically on google forms and I'll move away soon)
If you're getting over a million hits I might add some incentive to donate, but mainly I take this as my payback to the developer community. I'm launching another product in a different domain at the moment and hoping that can compensate.
At the end of the day, if anything goes wrong, I'll always be happy to open source the whole thing.
> And if you think that's okay, you should take your head out of the sand because consumers are demanding it. Please tell me how many of your users like the large cookie agreement popups that they have to dismiss...I-I mean read and accept just to consume your content. Agreements that you're forced to have them agree to because you're using cookie-based trackers like GA.
I think that's the heart of why I so despise the GDPR. In an intent to change site behavior, politicians passed a law putting a burden on sites that did an undesirable thing (rather than, say, making the undesirable thing itself illegal).
Perhaps they thought sites would avoid the burden.
Did they not anticipate full shifting of burden onto end-users? Because being able to know how a site is used is extremely valuable to the site's owners.
I tried Matomo (Piwik) recently, but I only do log analysis and it doesn't really treat log access as a first class citizen. If you use Javascript tracking, it's probably the right way to go.
I switched back to AWStats for my personal stuff. It's probably too basic for business or company apps, but for your personal stuff without javascript/cookies, it's still a great analytics tool.
I appreciate the problem and I would like to stop using GA in my static pages as well, but trading one privately-owned software from a tech giant for another privately-owned software from a different tech giant seems a bit ludicrous. I would readily swap GA for some decent open-source solution though.