| Lets gather info from the paper and see if what they say makes sense. In discussing figure 1, they seem to know this data needs to be normalized to number of cell divisions: >"The prevalence of somatic mutations was highly variable between
and within cancer classes, ranging from about 0.001 per megabase
(Mb) to more than 400 per Mb (Fig. 1). Certain childhood cancers
carried fewest mutations whereas cancers related to chronic mutagenic
exposures such as lung (tobacco smoking) and malignant melanoma
(exposure to ultraviolet light) exhibited the highest prevalence. This
variation in mutation prevalence is attributable to differences between
cancers in the duration of the cellular lineage between the fertilized egg
and the sequenced cancer cell and/or to differences in somatic mutation
rates during the whole or parts of that cellular lineage1." And that they believe these mutations are accumulating at a relatively constant rate over time: >"The mutations in a cancer genome may be acquired at any stage in
the cellular lineage from the fertilized egg to the sequenced cancer cell.
The correlation with age of diagnosis is consistent with the hypothesis
that a substantial proportion of signature 1A/B mutations in cancer
genomes have been acquired over the lifetime of the cancer patient, at
a relatively constant rate that is similar in different people, probably in
normal somatic tissue" So now let's implement their model with the required assumptions: Define the probability a mutation occurs during a given cell division as p.
Define the probability does not occur during a given cell division as q = 1-p.
Define the number of accumulated mutations required for carcinogenesis as n.
Define the number of cell divisions that have passed since the zygote as d.
Define the number of cell lineages in the tissue as Ncell.
Define the proportion of cancer cells that go on to form detectable tumors as C.
Assume the mutations can only occur once per cell.
Assume the mutations are occurring at the same rate (ie p1 = p2 = ... = pn).
The probability a mutation does not occur during division 1, or division 2, ... or division d would then be given by q^d (since p is constant we simply multiply the probabilities as for independent events).The probability the mutation did occur at some point up to time d must then be given by 1-q^d. And for the n required mutations we would get (1-q^d)^n.
We just derived the CDF of the geometric distribution, extended to allow for multiple parallel events. This is the cumulative probability of a cell lineage turning cancerous according to the mental model they describe in the paper, which is pretty much Armitage-Doll without mentioning the name.To get the probability of a cell lineage turning cancerous at a given age (ie the pdf of this distribution) we calculate the first derivative of that function (warning: this is a continuous approximation of a discrete process): -n*q^d*log(q)*(1 - q^d)^(n-1)
The expected number of cases per person after d divisions (division-specific incidence rate) would then be C*Ncell*-n*q^d*log(q)*(1 - q^d)^(n-1)
You can see that only the height of the curve is affected by C and Ncell, the shape is independent of those factors. In the (non-simplified) Armitage-Doll model the shape of the curve depends only on the mutation rate and number of required mutations.In that paper, they report seeing a range of roughly 10^-9 to 10^-4 cancer-specific mutations per bp in already detected tumors. If those arose after 10 divisions, the mutation rate would be 10^-10 to 10^-5 mutations/bp/division, etc. So we can see those values are empirically determined upper bounds on the mutation rates. So lets use the higher of the two as our value of p. Let us also assume only n = 2 mutations are need accumulate to result in a detectable tumor. Using R to make the upper plot: p = 10^-4; q = 1-p; n = 2; d = 1:20000
plot(d, -n*q^d*log(q)*(1 - q^d)^(n-1), type = "l",
xlab = "Divisions since Zygote", ylab = "Pr(a Cell Lineage Will Turn Cancerous)")
abline(v = log(1/n, base = q))
https://s14.postimg.org/p6wncjv9d/melan.jpgActually, by setting the second derivative of that CDF to zero, we can see that the Armitage-Doll model predicts a peak in age-specific incidence at log(1/n, base = q) divisions (vertical line on the upper plot). That 10^-4 value comes from Melanoma, so let us also look at the age-specific incidence for that cancer (lower plot). There we see the
peak incidence occurs at age ~age 90. So according to their model, the skin cells that are causing melanoma must be ~7k divisions separated from the zygote, corresponding to an average of ~78 divisions each year, or every ~5 days. Is that what happens? Remember, we used a real upper, upper bound here on the mutation rate from their data, and only 2 required accumulated mutations. Even then we are getting into cells that are 78 generations separated from the zygote before being cancerous. What you will find is that the division rates required to fit what people really suggest (eg p=10^-7 and n=3) are insane according to the accepted model. If they have a different model than that, why do they not write it down and compare to epidemiological data? |