| (data engineer here) Nice post! It's always fun reading about people being creative and challenging the analytics status quo (aka GA). Besides the joy of doing it yourself, you've accomplished a couple other things worth mentioning: 1. You'll never be sampled. GA samples historical data pretty heavily, and you have to pay for 360 to retain unsampled event data (at a tune of $160k+ per year). 2. You have full access to all generated data. I'd highly recommend using Snowplow's javascript tracker (https://github.com/snowplow/snowplow-javascript-tracker) in a very similar manner to what you've outlined here. You'll get a ton of extra functionality out of the box, which would add yet another level of insight. With snowplow, you get the following for free: 1. Sessionization, which is consistent with google analytics' definition - effectively a 30 minute window of activity. 2. User identification - the tracker drops a persistent cookie (just like GA), so you can see returning visitors. 3. Tools for splitting requests 4. A variety of event types, out of the box: https://github.com/snowplow/snowplow/wiki/2-Specific-event-t... 5. Ability to respect Do Not Track 6. Time on page, browser width/height, etc 7. Ability to make your event tracking 100% first-party (Disclaimer: I don't work for them, but I've seen the system work very well a number of times.) I'm running a similar setup on my blog, and it costs well under $1 per month: https://bostata.com/client-side-instrumentation-for-under-on.... I'm doing the same exact thing with Cloudfront log forwarding and have several lambdas that process the files in S3. From there, I visualize traffic stats with AWS Athena (but retain a ton of flexibility, since they are all structured log files). |