Hacker News new | ask | show | jobs
by alias_neo 1954 days ago
I have always used GoAccess on my blog (https://2byt.es) which gets very modest traffic because I don't post much and don't advertise outside of my few twitter followers. Privacy has always been a core principle of mine.

I've found that over time, crawlers drown out the numbers of actual visitors but I find GoAccess hard to use to get any meaningful data from when interesting things do happen.

Can anyone suggest a way I can do something similar to this without relying on a service I don't host (and without having to write parsers into a SQL or similar DB by hand)?

6 comments

lnav[0] can take care of parsing the log files and displaying them in a TUI. There's also a SQLite interface for doing queries. However, you'll need to build the filters/queries yourself, there aren't any built-in ones at the moment.

[0] - https://lnav.org

Perfect, thank you, this is right up my street!

I host a static site for my blog, using Hugo so "no server" etc is exactly what I need, and writing the filters/queries myself leaves me in control of getting what I need out of them.

I'm waiting as well for the ability to use filters directly from goaccess. Hope they get to it soon! https://github.com/allinurl/goaccess/issues/117
https://pirsch.io/ I'm one of the founders :) Check my post above (or below, who knows on hn?) for more details, or ask me.

[Edit] Sorry, I did read "which I don't host" instead of the other way around. You can check out the open-source core library, that might work for you if you put in some work.

Wow, Go support, thanks.
There is the --ignore-crawlers argument, on my modest projects it seems efficient but I've not look at it too precisely.
I've always had this flag in my run script but find I still have huge amounts of crawler traffic. I might need to look at that again.
For a long time I've thought about scavenging the robot identifiers from Matomo, née Piwik - so one might leverage the hive mind in updating robot identifiers, and use it to strip plain access logs for easier use with tools like goaccess..
I use Cloudflare Web Analytics. Since I use Cf, I thought: why not utilize their analytics? Anonymous, no cookie, no fingerprinting and no localStorage.

Edit: also, no JS if you have Pro (20usd/mo).

I read that DNS analytics is not accurate too.
Semi-related question: Is any type of web analytics 100% accurate?
I meant not as accurate as client tracking if ad blockers were not used. As far as I understand it.
Tracking visitors at the client level deflates the actual number of visitors. On the other hand, server-side tracking provides a more accurate number with the tradeoff of not knowing for sure if the client is a human behind a browser.
But even if ad-blockers were not used, people still disable JS, JS files fail to load, JS takes longer to load than the user stays on the site and many other potential issues.

My comment was not directly related to yours, I agree that DNS analytics are probably worse, I was just wondering if theoretically is it possible to produce high-accuracy analytics when everything can be spoofed/cached, etc.