The title of the article really hooked me. My background: 1. I have a degree in mathematics
2. I've been writing software since high school
3. I've had to do more CLV calculations than I can count
4. I've co-founded a retail startup (http://everlane.com)
So, take this critique with an open heart.Here's a summary of the points below, for those short of attention. The article is caught between writing for a technical and non-technical audience. There are too many technical terms for a non-technical person to make real sense of it, and not enough detail for a technical person to use the article as a reference. First, and overall, I'm not sure who the target audience is. The topic is technical enough that I'd guess it's for the head of analytics or a data scientist inside a retail company. If that's the case, the article is really light on detail. There's nothing I can apply immediately to my work, except to Google for some of the phrases in the blog to learn more. There aren't even links to external papers with more detail. If it's more for marketing people, well, it's too technical. Bayesean? Gamma distribution? Second, assuming I'm in your target audience, give me the math! Spell it out plainly, even if you don't tell me "why" it works. For me "plainly" means "mathematically" coupled with a plain-English explanation. You can link to external papers for that. But I'm not afraid of a summation or an argmax. In fact, it's much easier for me to understand than someone writing it out in plain English. You throw out "gamma distribution" but don't even link to the definition, explain what it is, or explain how λ and μ fit into it. "The gamma distribution is a perfect candidate, since it characterizes most customer bases very well." Why does it characterize most customer bases very well? I really really want to know. The fact that it's a gamma distribution is almost incidental to the deeper point about what distributions characters which features of customer behavior, and why. Because of my background I know some of those things and can easily figure out the rest, but you're not making it easy for me. It's like you go into detail on the soft stuff (where I care less about detail), and about detail on the hard stuff (which is exactly where I want detail). Third, numbers! The margin of error chart is mostly irrelevant. In any case, it's not detailed enough for me to use to compare these different methods, and at best serves as a way for a non-technical marketer to say, "No, the Bayesean way is better. Look at this chart." Give me a concrete example of applying this technique first, and then give me a concrete example of comparing it to the other techniques. Fourth, the typeface on the blog is just awful. If you're producing scientific content I'd recommend using a serif typeface. Georgia is ok, but something like Garamond or Century Schoolbook are more similar to Computer Modern (the default LaTeX typeface). |
Here's a brief rundown of the math, more details can be found in the papers linked below [1,2].
We assume a latent attrition model, that is customers purchase with exponentially distributed interpurchase times, and have a constant probability of dying. We then assume that the rate parameters of these two distributions are gamma distributed.
The gamma distribution is the first choice of distribution because it is the conjugate prior for the exponential distribution. For the Pareto/NBD it means that we can write the likelihood function without having to use quadrature to solve the integral. It is possible than another distribution would work even better, though it would likely be more computationally intensive.
Another nice characteristic over, say, the log-normal is that when the shape parameter is less than 1, lim_{x -> 0} = \Infty. This is a nice feature for many customer bases who have many infrequent customers, or many one-time customers.
For the percent error numbers, we picked a representative sample of our clients who had over two years of data, and ran the three models with a holdout set of the most recent year. We then compared the performance of the Pareto/NBD compared to ARPU and compared with picking the year old cohort. I uploaded a boxplot of the data, which you might find more informative [3].
Happy to chat more about the math here or by email (aaron@custora.com). Also would love to hear more about your retail startup and your CLV issues around that.
[1] http://www.jstor.org/pss/2631608
[2] http://marketing.wharton.upenn.edu/documents/research/Fader_...
[3] http://blog.custora.com/custora-content/uploads/2012/02/esti... (Note, the boxplot was generated a few months ago from different data, and we've updated the numbers for the blog post)