Hacker News new | ask | show | jobs
by decremental 1824 days ago
> Bot Removal & IP removal

People who advocate for alternatives to GA I suspect miss this very crucial aspect of the service. When you have a long running site, especially one that has user accounts or conducts ecommerce, it can be dominated by automated traffic. The longer the site has existed and the more popular it is, the more this is the case.

I'm not even referring to legitimate search engine crawlers but the automated exploit bots, the spam bots, the people running site suckers, and who even knows how many other things people get up to for malicious purposes.

Any GA alternatives that rely on server logs is instantly never going to be a viable alternative. Alternatives that don't rely on logs still cannot do a sufficient job of weeding out the automated traffic. I have never, not even once, heard of a solution that can tackle this overwhelmingly critical problem.

1 comments

This is a big problem and exactly why I had to abandon log-based analytics for my personal blog. Some bots are easy to spot but others seem to be running real browsers, or as good as, which makes it impossible to weed out bad actors from logs alone.

I ended up coding a simple hit tracker (that was all I wanted) with a javascript beacon which worked well for a couple of years. But recently my blog was hit by an ongoing attack[0] that even executed the beacon code - I have no idea why. I am not convinced that any of these services would discount this type of traffic, I ended up having to use Cloudflare and even then needed some custom firewall rules.

[0] https://sheep.horse/2021/6/botnets%2C_or_this_is_why_we_cann...

I can't recommend Cloudflare enough. It has become a critical service to me. It recently helped me out a ton when bot traffic increased dramatically out of nowhere. I was getting hit 5 million times a week by one type of bot alone and Cloudflare's automated bad bot detection completely mitigated it with a single click. It has also been my experience that automated traffic has become so much more sophisticated in recent years.
Cloudflare is a great service but I hate that I have to use it due to factors beyond my control.

I was disappointed that Cloudflare didn't automatically detect the traffic as malicious but feeding the bots a captcha almost completely mopped up the problem.