Hacker News new | ask | show | jobs
by mejakethomas 2594 days ago
(data engineer here)

Nice post! It's always fun reading about people being creative and challenging the analytics status quo (aka GA). Besides the joy of doing it yourself, you've accomplished a couple other things worth mentioning:

1. You'll never be sampled. GA samples historical data pretty heavily, and you have to pay for 360 to retain unsampled event data (at a tune of $160k+ per year).

2. You have full access to all generated data.

I'd highly recommend using Snowplow's javascript tracker (https://github.com/snowplow/snowplow-javascript-tracker) in a very similar manner to what you've outlined here. You'll get a ton of extra functionality out of the box, which would add yet another level of insight. With snowplow, you get the following for free:

1. Sessionization, which is consistent with google analytics' definition - effectively a 30 minute window of activity.

2. User identification - the tracker drops a persistent cookie (just like GA), so you can see returning visitors.

3. Tools for splitting requests

4. A variety of event types, out of the box: https://github.com/snowplow/snowplow/wiki/2-Specific-event-t...

5. Ability to respect Do Not Track

6. Time on page, browser width/height, etc

7. Ability to make your event tracking 100% first-party

(Disclaimer: I don't work for them, but I've seen the system work very well a number of times.)

I'm running a similar setup on my blog, and it costs well under $1 per month: https://bostata.com/client-side-instrumentation-for-under-on.... I'm doing the same exact thing with Cloudfront log forwarding and have several lambdas that process the files in S3. From there, I visualize traffic stats with AWS Athena (but retain a ton of flexibility, since they are all structured log files).