Hacker News new | ask | show | jobs
by pvg 804 days ago
Aha ok, but if I am understanding this right, the future can change the past of this graph, right? Like our hypothetical user who first appeared in 2012 and last posted in 2016 - right now they appear in the 2016 red line but if they showed up again today and you made the graph again next year, they wouldn't be in the 2016 red line anymore. Or put another way and one that you can try: What happens if you cut off the data at 2022, 2021, 2020, 2019, 2018, etc and plotted those graphs? You'd see a different (rather than merely truncated) graph, no? Maybe even a different trend. So if my understanding is right, this is a pretty wiggly metric. The history of something you want to use as a historical trend line should not change as you append more data.
1 comments

> Like our hypothetical user who first appeared in 2012 and last posted in 2016 - right now they appear in the 2016 red line but if they showed up again today and you made the graph again next year, they wouldn't be in the 2016 red line anymore

That is correct.

> What happens if you cut off the data at 2022, 2021, 2020, 2019, 2018, etc and plotted those graphs? You'd see a different (rather than merely truncated) graph, no? Maybe even a different trend. So if my understanding is right, this is a pretty wiggly metric. The history of something you want to use as a historical trend line should not change as you append more data.

I see your point, but I don't see how it is avoidable. From my knowledge, any user churn metric will suffer the same effect: If you consider a user is churned after two weeks of inactivity, then this will change if you change the cut-off (the last two weeks of the this month? the two weeks before them? ...etc).

Even if you measure the "elabsed time" instead of "last seen", the cut-off will change your curve.

Extreme example: If you assume a user is churned after 1 year of inactivity (elapsed time since last activitiy), then a user that shared one story in 2007, and then a second story in end of 2023, will apear as active. If you change the cut-off from 2023 to 2022, then the user will appear as inactive.

I don't see how it is avoidable.

You can define a metric such that future data doesn't affect past data. Here's a straightforward one: a user is inactive at time t if they haven't posted in the period between t and t - k where k some constant time period one picks. So let's say k is a year and you're looking at active users per year†. So in your last example, the user would be counted as active in 2007 and 2008, counted inactive in 2009 to 2022 and would count as active in 2023. If you truncate the data at 2022 nothing changes.

† year is probably too big of a window for this (I'd take something like a month) but let's stick with it for now

In a SaaS product, these are different issues:

1. I am making a proxy for churn (last seen == end of subscription in this product).

2. You are looking for active users (yearly / monthly / ...)

I think your suggestion (point 2) is definitely an important view, but it doesn't conflict with the need for point 1.

Sorry if this is self-evident, I will just leave this link for reference on such metrics: https://userpilot.com/blog/product-engagement-metrics/

In any case, I am happy to help: if you would like an export of the data, or the DB dump, let me know. And I very much looking forward for your analysis :)