Definitely an interesting article and one worth reading. I work a lot on viral growth and I'd like to add a counterpoint that the growth of apps is so complicated that it is basically impossible to model in a useful way.
I've found that even if I make no changes to an app, the retention and virals fluctuate quite a bit for no apparent reason, and the fluctuations are big enough that it makes long term forecasting really more of guesswork than anything else.
Also, there are second order effects that are hard to model as well. For instance, improving virals can improve retention (user A invites friend B, user A stays for longer because their friend uses it).
I've gone through the process of modeling a couple apps, and it quickly gets to a point where the relationships become circular and small variations cause exponential differences down the line.
It is important to make informed decisions about virals and retention, but I don't think such a model is the way to do it. I think it is more important to think about optionality and decision making in opaque environments rather than trying to model the unmodelable.
This multiplicative kind of model is happily amenable to time series analysis, so you can do stats to see what your numbers are and how well they fit. That's great. What's less great is the model quality, given that well-tested virality models can be found in other venues. Coffman looks at this, for instance, at http://datacommunitydc.org/blog/2013/01/better-science-of-vi.... The difference in these two types of models is that symmetries in the statement of the problem permit, or exclude, classes of solutions. Those symmetries come from assumptions about the contact graph, the most basic (and testable) assumption.
This is great stuff. I see so many startups get so excited about features and growth yet fail in their analysis of retention. Maybe I'm a bit on the data-nerd side but I love to see folks sharing their own methods for tracking and calculating this stuff. Even with so many startups basing their model on recurring revenue, it's still easy to trip up on modeling this stuff going forward.
> so many startups get so excited about features and growth yet fail in their analysis of retention
I think it's just we write less about it, not so much that we aren't aware of its importance. "How we managed to retain our users for 4 months" sounds admittedly less sexy than "How we got a bazillion users in less than 72 hours", but the truth is a tech startup with no strong retention strategy is basically dead in the water, and generally folks know this.
The number of customers at time t is the (real
valued function of a real variable) y(t).
We assume that at the present t = 0 and that we have
y(0), that is, the current number of customers.
We let the number of customers who will ever try our
business be b. That is, b is our intended 'market
potential'.
Initially we assume that once we get a person as a
customer, we do not ever lose them but keep them
forever.
As usual, we let y'(t) = dy(t)/dt be the calculus
first derivative of y(t). Then y'(t) is number of
new customers per day, that is, the 'rate' at which
we gain customers.
For 'virality' we notice that that is proportional
to (1) the number of customers y(t) we have
'talking' about our business and (2) the number of
people
b - y(t)
yet to be our our customers hearing the talking.
Then we have that for some constant of
proportionality k
y'(t) = k y(t) (b - y(t))
So we have an initial value problem (that is, we
know y(0)) for a first order (we use only the first
derivative) ordinary (no partial derivatives)
differential equation.
So this solution grows (1) initially slowly, (2)
then more rapidly, (3) then more slowly and
approaches b asymptotically from below.
In case we lose some customers forever at some rate
r, then we get the same solution except k and b get
adjusted.
Once there was a startup (now a major company) that
was struggling and had as an investor a major
company with a Board seat and at the startup two
representatives, one in finance and the other in
aeronautical engineering.
The two representatives had asked for some revenue
growth projections.
People around the HQ considered what the startup
hoped, intended, thought might happen, etc., but
found nothing credible.
One guy who remembered calculus reluctantly got
involved, formulated and solved the differential
equation above, and showed the solution to a Senior
VP of Planning (SVP) who reported to the founder,
CEO, COB. The SVP was responsible for the
projections. The SVP took the guy's calculus
solution as the basis of the projections and on a
Friday sat with the guy with a pocket calculator and
some graph paper and graphed solutions to the
differential equation for selected values of the
constant k and picked one of the solutions as the
official projection.
The next day, Saturday, at about noon, the guy was
in his office working on some other math problems
and got a call from a person asking if he knew about
the projections for the Board and if he could come
over to the HQ? Sure. When the guy arrived, the
situation was grim: The two representatives of the
major Board Member were standing in the hall with
their bags packed with airline tickets back to
Texas. The startup was about to die.
The SVP was traveling and out of town.
The person who had called got the graph of
projections from the previous day and asked the guy
to reproduce a point on the graph. Using the
calculator, the solution above, and a few
keystrokes, the point on the graph was reproduced.
After several more points were reproduced, the area
became happier; the two representatives on the
Board stayed, and the startup was saved.
Later the person who had called explained that that
Saturday was a Board meeting, the growth projection
graph was shown, and the two representatives had
asked how the projections were calculated. The rest
of the company tried to reproduce the graph but
could not. The Board meeting stopped. The two
representatives lost patience with the startup, got
airline tickets back to Texas, returned to their
rented rooms, packed their bags, and as a last
chance returned to the startup to see if there was
an answer to how the projections were calculated.
Ah, one saved startup! One reason to take calculus
seriously!
Note that with this derivation, if accept the assumptions (which obviously do not always hold), then all there is to 'viral' growth are three numbers, the current number of customers y(0), the eventual number of customers b, and the constant k. This situation holds also in the case of some customers leaving and never coming back (just by some adjustments in b and k).
For k, might fit to past data. For given y(0) and b, all k does is adjust how fast the curve rises to the asymptote. So basically all we are doing is interpolating between y(0) and b.
Otherwise, all viral curves are the same.
So, an advantage of my derivation is a simple, explicit equation for a fairly general solution.
The article has a comment claiming that biology addresses a similar problem and gets a 'logistic' curve. The comment didn't say just what was meant by a logistic curve, but I suspect that my solution here is an example. If so, then here we have an 'axiomatic' derivation of the logistic curve.
It is true that the growth of some products, e.g., TV sets, look to the eye very much like one of the curves from my solution for selected values of y(0), b, and k.
Could also make a Markov assumption: So, assume that get new customers (and, if wish, lose old customers) at some 'rates' and, thus, get a continuous time, discrete state space Markov process. Then as is well known the solution is a matrix exponential. Could evaluate the matrix exponential or just use Monte Carlo to generate a few thousand sample paths. Then could put some confidence limits on the deterministic solution.
Since no one guessed the war story, the startup was FedEx, the SVP was Mike Basch, the CEO, of course, was Fred Smith, the person who called on the phone was Roger Frock, and the investor was General Dynamics. The arithmetic was courtesy of an HP-35. So, HP might run an ad saying how they saved FedEx!
I've found that even if I make no changes to an app, the retention and virals fluctuate quite a bit for no apparent reason, and the fluctuations are big enough that it makes long term forecasting really more of guesswork than anything else.
Also, there are second order effects that are hard to model as well. For instance, improving virals can improve retention (user A invites friend B, user A stays for longer because their friend uses it).
I've gone through the process of modeling a couple apps, and it quickly gets to a point where the relationships become circular and small variations cause exponential differences down the line.
It is important to make informed decisions about virals and retention, but I don't think such a model is the way to do it. I think it is more important to think about optionality and decision making in opaque environments rather than trying to model the unmodelable.