Hacker News new | ask | show | jobs
Show HN: Ackee – Self-hosted website analytics (ackee.electerious.com)
214 points by electerious 2483 days ago
16 comments

A side project I've been working on for a while now. Ackee is a self-hosted, Node.js based analytics tool for those who care about privacy. It runs on your own server, analyses the traffic of your websites and provides useful statistics in a minimal interface. If you have any questions or feedback, just write it here :) Thanks!
Piwik/Matomo is a popular tool in the same space. How would you compare Ackee to Matomo (tech aside)?
Piwik/Matomo is more like Google Analytics in the way it works and what you can do with it. Ackee will never be a replacement if you need full-featured marketing analytics with tons of options and insights. Ackee tries to be less. Lightweight, easy to install and with a good balance between analytics and privacy.
What about vs. the old perl script, awstats? I'm still using that, amazingly, after like 15 years. It does the basics fairly well, if a little ugly. I'm always looking for new cool stuff though.
Log file analytics does not work with Javascript apps.
True, but log file analytics work despite JavaScript being disabled in the browser.
What happens when you start to accumulate over 5 million+ records (as an arbitrary large number)? Or a couple gigs of stats data?

We used to provide our own stats software to our customers before it became an utter nightmare of indexing databases, cron jobs, report generation, and customers stepping over their hosting quotas because of all the stats data.

If your reports are predefined - summarised and no adhoc reporting needed, you can look into the continuous view feature of PipelineDB - http://docs.pipelinedb.com/continuous-views.html

> As soon as a stream row has been read by the continuous views that must read it, it is discarded. Raw, granular data is not stored anywhere. The only data that is persisted for a continuous view is whatever is returned by running a SELECT * FROM that_view. Thus you can think of a continuous view as a very high-throughput, realtime materialized view.

I use PipelineDB continuous views in Freshlytics - https://github.com/sheshbabu/freshlytics/blob/master/src/ser...

TimescaleDB also has something called "Continuous Aggregates" - https://docs.timescale.com/latest/using-timescaledb/continuo... but I haven't looked into it much.

I can't speak to the OP's performance, but Snowplow[1] may be good for your use. It's a bit more general purpose than just pure web analytics, and doesn't come with the reporting interface. But as far as the data collection and storage at scale aspect, it's fantastic.

[1] https://snowplowanalytics.com/products/snowplow-open-source/

> What happens when you start to accumulate over 5 million+ records

I don't think Ackee is the right tool if your site as so many visitors. And to be honest: I don't know what will happen at this scale. There shouldn't be a problem as long as MongoDB can handle it.

Agree, just make sure you are running a modern version of MongoDB with WiredTiger storage engine and the server has enough memory. It should be able to handle 5 million records on modest hardware.
Looks good! We removed analytics from our sites for privacy and simplicity reasons so I’ll consider this. Does it use JavaScript clientside or hook up to express or nginx or both?

Before google analytics I remember there were a few self hosted tools in this space. Would you mind adding a list of the most current ones that people are using? I need to get caught up.

Couple of ones I can think of:

Fathom: https://github.com/usefathom/fathom

Goatcounter: https://github.com/zgoat/goatcounter

Freshlytics: https://github.com/sheshbabu/freshlytics

There's also a non-oss but privacy focussed analytics - http://simpleanalytics.com

Disclaimer: I'm the creator of Freshlytics

You may want to add Countly (for both web and mobile) to that list (https://github.com/countly/countly-server) - which can be deployed on DO easily as well for free (https://marketplace.digitalocean.com/apps/countly-analytics)
Ackee uses a JS snippet to get the data. Nginx logs won't work, but the API is fully documented and it should be possible to build an import / custom script. Could also work with express.
How does this compare to Fathom?
Ackee and Fathom are very similar. Both in the way they display data and how they process it. The biggest advantages of Ackee (compared to Fathom) are probably:

- A documented REST API that lets you build upon Ackee. Could be used for custom import scripts or apps that display your current visitor stats in the menu bar. - Ackee allows you to track more than just page/site views (browser, system, etc.). This is optional and off by default, but great for people/companies that need more insights.

Fathom definitely has a head start, while Ackee is very young. Can't wait to see how Fathom v2 will look like and how both will compare in the future.

There is a huge need for massive innovation in this field. As with many other products Google came in and sucked all the air out of the room. I'll be installing this on my website.
This looks nice and clean. Is it all domain based, or can I drill down to view hits of individual URLs.

+1 for self-hosting analytics, something I encourage everyone to do if you have the time to devote to it. I dislike sites handing my browsing habits off to random third-parties.

Obviously, really large sites need more sophisticated analytics but for most sites something like GoogleAnalytics is complete overkill. It grabs information that nobody except Google's machine learning bots are ever going to look at.

Glad you like it!

There's still a lot missing in Ackee, but even with all the features I'm planning to add it will never be a full replacement for Google Analytics. And I think that's the best about Ackee. Because not everyone needs full-featured marketing analytics with tons of options and insights.

Ackee can also show page views, but it's not implemented in the UI, yet. This is the same for browser, system, visit duration and other insights (all of them are optional to track and turned off by default). Keep an eye on https://github.com/electerious/Ackee/issues/35 to know when it's ready.

Looks very promising. Might I also suggest a few other useful metrics:

Bounce Rate: The percentage of visitors to a particular website who navigate away from the site after viewing only one page.

Aside form that one it would be useful to track Page Views per Session, PVs per Page Type (homepage versus article page), and even Page Depth (per Page Type)

Just a thought. Still really nice.

> Bounce Rate

This requires tracking sessions.

> Page Views per Session

This requires cookies or some other mechanism for stringing individual page hits into a single stream. While you can, sort of, kind of, do that based on the source IP combined with the user-agent (like this project does), it's not terribly accurate and it can't track longer sessions.

I would argue that very few people object to being tracked by the site they are actually visiting. Most object to being tracked by 3rd parties, and across multiple unrelated sites. So a simple 1st party cookie is quite acceptable, but you can certainly go an extra mile and ask even if it can be set.

Thanks for the suggestions! There's still a lot missing, but the base is ready and I can't wait to add new features.
This is great. I think there's a market for self-hosted/on-prem analytics software due to the mounting privacy fears and browser-default third-party request blocking. Outside of piwik/matomo, what else is out there?
I'm always interested in these. Would love to get off Google Analytics. Alas, they always seem to forget to offer event based interaction tracking (e.g., button clicks). Most of our core metrics are tied to events of this sort, and rolling our own separate event log doesn't satisfy our needs either.
Event tracking is probably the best way learn more about your uses. It's something I will definitely look into as soon as all the basics are done.

I've created an issue for it if you want to keep an eye on the progress: https://github.com/electerious/Ackee/issues/40

Looks very nice! Simple, clean, and attractive.

This seems to me like a perfect job for a serverless solution, given how popular static sites have become. Serverless would provide minimal cost when you don't have a lot of traffic (and could scale if you get HackerNews'ed).

I was thinking "well if you still store user identifiable data it's still going to infringe privacy" but then I was satisfied when I read:

> No unique user tracking

on the site.

Is there any sort of user management, so each user have access to stats about its own website? We are building a WordPress hosting company and would love to offer this as an alternative to Google Analytics to all customers.

Right now, each customer gets its own Fathom install, but I would prefer to have it centralised to make management easier and decrease the amount of non-standard packages on the WordPress containers.

Have you considered Matomo? https://matomo.org/
We have, but as mentioned by the author of this post, Matomo is more like Google Analytics. We want something simpler, with bigger focus on privacy.
Depending on where you draw the line on "privacy", Matomo can also be hosted by you, if you don't mind the extra burden, and one installation can track multiple sites.

Edit: Not advocating in favour of Matomo, it's just that it is a solution which seems quite full featured. Personally I found it painful enough to work with it (from a devops perspective) that I wouldn't touch it with a ten foot stick.

We do have an instance of Matomo installed for testing, but it still feels like just a self-hosted version of Google Analytics.

Way too many features that a person with a blog won’t need. Something simpler like Fathom or this one, which was built with privacy in mind from the get-go — instead of just “I want to get away from Google” —, seems like a better solution.

To be clear, I am not saying Matomo is a bad piece of software. Just that it does not feet our needs.

Is there a Jamaican connection, like in ackee and saltfish[1]? (Also famous from Harry Belafonte, "ackee rice / saltfish is nice")

1: https://en.wikipedia.org/wiki/Ackee

> for those who care about privacy.

As a user, I dont see tracking by specifically google being the problem; what I'm against is being tracked _at all_ - by anyone, self hosted or not.

There's "caring about privacy" in the subheadings, yet there's a whole section in docs about collecting private data [1]. Empty words.

I've used goaccess [2] in the past to provide traffic analytics. It reads from nginx/apache logs. You only get access to what browsers send anyway, and users who alter their user agents are in the minority, so they wont affect analytics much.

[1] https://github.com/electerious/Ackee/blob/1cf7779/docs/Anony...

[2] https://goaccess.io/

Why do you consider some data your browser provides as private and some others not? Not saying I disagree, but hardliners could say they don't want what the browser provides in any way as tracked. Saying this app is hypocritical because they support a different notion of privacy is akin to saying your comment is hypocritical because you say you're against being tracked at all, then talk about tracking browsers via web server logs.

I think we need to recognize privacy as a spectrum and applaud self-hosted alternatives to traditional saas services even if they don't conform to your notion of tracking.

Browsers send too much information on their own, and they shouldn't. But there is still a huge difference between analyzing access logs and tracking users. When you actively track users you build complete profiles of people with any kind of information. Access logs do not go nearly as far, they're mostly limited to IP address, a few info about browser/OS type, and sometimes a referrer; most of which are easy to spoof.
Sounds like a difference in perspectives; you as a user think of data about you as personal and private, while an operator of a site using Ackee thinks of their sites' tracking data as private. Quite amusing, really. :)
I released an analytics tool earlier this week that doesn't collect PII or set cookies. All the tracking events are aggregated and you can't even see the tracking logs of a particular visitor or visitor's browser as with log based analytics.

https://github.com/sheshbabu/freshlytics

> What I'm against is being tracked _at all_ - by anyone, self hosted or not. > I've used goaccess. It reads from nginx/apache logs.

While I'm using GoAccess myself and I too recommend it over using JS libraries, your server logs usually contain private data as well (user agent, ip address).

> There's "caring about privacy" in the subheadings, yet there's a whole section in docs about collecting private data [1]. Empty words.

The advanced tracking is turned off by default. Ackee will never store a browsing history of a user and tries it best to keep tracked data anonymised.

It's all about finding a balance between privacy and analytics. At the end it's still an analytics tool and there would be nothing to show without data.

Using "nginx/apache logs" and "what browsers send anyway" is more than Ackee tracks by default. Storing and analysing this data isn't even allowed by the GDPR without asking the user.

Do I need to add the javascript to get any data or can I also just use the nginx logs?
Ackee uses a JS snippet to get the data. Nginx logs won't work, but the API is fully documented and it should be possible to build an import / custom script. Could be a great addition :)
I'd been hoping for an actively maintained spiritual successor to Inman’s Mint. https://haveamint.com
I've used Mint and started to work on Ackee since it has been abandoned. It took me several rewrites and years, but it all started with Mint as an inspiration :D
Anyone have any tales to tell around migrating Google Analytics data into a self-hosted system and how well it worked? I don't see such a feature for this one.
How did you come up with that name?
All started with my self-hosted photo-management tool Lychee (https://lychee.electerious.com). We named it Lychee because we had some lychees on the desk while working on the first version. It was the only cool name we could come up with.

All my tools/webapps after Lychee are somehow following the same species or clade: Rosid, Malvid and now Ackee.

I presume I have to include a `cookie` tracking acceptance modal for GDPR compliance. Correct?
Ackee only tracks page views, site visits and referrers by default. It stores no IPs and it's not possible to view the browsing history of a user. The data can't be used to identify a user and should therefore be GDPR compliant.

But: I'm not a lawyer. I can't give you a legally certain statement. It's hard to say what's "identifiable data" according to the GDPR.