Hacker News new | ask | show | jobs
by alexpopescu 5753 days ago
I fail to see why analytics data seem to be considered "low quality" data ("we don't care if some stuff occasionally gets lost"). As far as I can tell, most businesses out there are driven by metrics which are derived from analytics data... so I don't agree that "it's OK to lose some".
3 comments

I use MySql and Redis for persistence, depending on the type of data. Both get written to on a purchase: MySql upgrades the account info, Redis holds my A/B testing stats which just had several tests score points. If the MySQL write fails, I have a CS emergency because my customer can't get what she paid for and I probably just ruined a lesson plan for tomorrow. If of the Redis writes fails, my A/B test results that I won't look at for a week anyhow shift in a way that almost certainly doesn't alter my final decision.

It is absolutely OK to lose analytics data occasionally, and indeed with the variety of ways to bork that (js is off, user agent prefetches undisplayed page, bot action, etc) if your stats aren't robust against it you are screwed anyhow.

If they use it for statistical analysis then the sample size decrease a bit, but likely not in a significant way. If they lost all their data that would be a bit different.

For instance, I routinely delete web server logs older than 30 days, on the assumption that if I didn't need it in the last 30 days I'll probably never need it. Every now and then this bites me and I need more than 30 days data to test some assumption, then I will just have to wait for a bit. (for events that occur infrequently enough).

What jacquesm said. It depends on the kind of analytics data; our data is not as important as someone else's analyitcs data. For us it's more important writes are of reasonable speed, that we can store the data in arbitrary structures in a schemaless way, that we can define arbitrary search indexes and that the data can be horizontally spread across multiple servers.

Website visit counters are a great example. I don't think many people care if a few visits get lost once in a while.