| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aaronjg 5227 days ago

Lead data scientist at Custora here.

The problem with linear regression (or any machine learning technique) in CLV prediction is extrapolation. Since we are making a predictive measure, we are projecting out what customers will spend in the future.

You can't construct a valid training dataset to project a 2 year CLV if you business is not two years old. You need to make some assumptions about how customer's ordering will continue. We have found that assuming that customer behavior follows the latent attrition model is a robust assumption for most of our clients.

Even if the business has been around for a long time, we have found that customer behavior tends to change over time. In that early adopters are often far more valuable than more recent customers, so using the earlier adopters as the training set leads to misleading results.

To use a machine learning framework, you need to make the assumption that customers who join recently strongly resemble customers with similar attributes who joined in the past. We have found that this is often not a valid assumption, and so we make the simplifying assumptions about customer behavior to add power to our models.

2 comments

ced 5227 days ago

OK, that's a good point.

Have you considered hierarchical modeling, like in Bayesian Data Analysis? I would have lambda and mu drawn from per-company gamma distributions, and have the parameters of these gammas drawn from global distributions (gamma distributions themselves?)

Also, you're using maximum likelihood. Have you done the full MCMC computations? (I don't think that it would make much of a difference - but it's nice to have empirical validation of that)

I would enjoy reading more about the HMM.

link

aaronjg 5227 days ago

We've considered hierarchical modeling, but concluded that there were no real gains. We have enough data from each of our clients to identify the parameters of the model. We will probably add more hierarchical modeling as we improve predictions of seasonality, primarily it will be useful for predicting the 'christmas effect' for new clients.

The posterior mode of the Pareto/NBD obtained through full MCMC is extremely close to the MLE, and the MLE is much faster to calculate so we use MLE. [1]

There has been some work done on using HMM to predict CLV. It turns out that in most cases the Pareto/NBD is a robust model for CLV. [2]

[1] http://dl.acm.org/citation.cfm?id=1305575

[2] http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1904562

link

aaronjg 5227 days ago

ced: We do some post-hoc analysis to find differences on whatever dimensions our clients give us. We have found that in general, the largest effect is the month of acquisition, and so this is the only factor that we include in the model right now.

link

ced 5227 days ago

Thank you for the info.

One last question: do you use gender, age, and other customer-specific predictors in your model? The distribution of lambdas for men and women could vary significantly.

link

Estragon 5227 days ago

The model you describe in your post seems to be assuming identifiability across customers regardless of when the relationship started. If you are not making inferences about new customers based on similar customers with prior attributes, how are you doing it? The ways I can think to would force great naivete on the inferences about new customers.

link

aaronjg 5227 days ago

We use the gamma prior and assume that shape parameter remains constant across time, and the scale parameter varies month to month. For each new month there are only two parameters that need to be estimated, one for the attrition rate and one for the purchase rate, so we rarely run into identifiability issues.

link