Hacker News new | ask | show | jobs
by XorNot 1334 days ago
There's a line in this paper which has been cited a few times in this thread which really stands out to me:

> 100,000 random in silico mutants were generated for both RaTG13 and BANAl-20-52...Only 1.2% of RaTG13 mutants resulted in a BsaI/BsmBI restriction map with a larger z-score than SARS-CoV-2. BANAL52 is the closer relative to SARS-CoV-2 by over 200 nucleotides, yet only 0.1% of mutants yielded z-scores as great or greater than SARS-CoV-2

1.2% of 100,000 is a weird way to express an occurrence rate. Because 1.2% is a little over 1 in 100. Which means from random perturbation they generated almost 1,200 candidates which would also match their conclusions. The 0.1% mutant number would still be about 100, by sheer random chance.

They also don't support this argument with any reference to the observed mutation rates of any of their candidates in the environment. How often, in nature, do we expect a new mutation to arise? They claim to use data for nucleotide substitution frequencies, so they're addressing the fact that mutation is a process with a temporal component, but not what the time span is.

If 1 in 100 commercial jetliners crashed every year, we'd regard that as so common as to make commercial aviation unsafe.

They then conclude:

> It’s unlikely such an idealized reverse genetic system would evolve by chance from the close relatives of SARS-CoV-2

From 1 in 100 occurrence rates? Of a virus?

Now we do in fact have data on how frequently viral mutation happens. During cell cultures it appears to be about at a rate of 9e-7 substitutions per nucleotide per replication cycle of 12 hours for RaTG13 (reproduced from culture in animal tissue too[2]). With a ~29,800 BP genome, that means 0.02 nucleotide substitutions on average per replication cycle. So about 25 days for a single virus strain, serially replicating, to substitute 1 base pair.

Of course, this is all ignoring the fact that viral mutation is highly parallel - which they also do not mention in reference to this conclusion. How many mutations are necessary to generate a z-score match which would fool their detection method? 1, 2, 10? How many potential restriction enzyme cleavage sites exist which would become cleavage sites by a single base-pair flip? What's the dynamic addition/removal rate of restriction enzyme sites expected to be? We have per nucleotide estimates for this for RaTG13, so it's also not an independent variable: to achieve the distribution of cleavage sites they propose, what is the mean-number of mutations for it to happen in the 1 in 100 candidates which achieved it? A virus doesn't explore it's mutation space serially, it explores it in a massively parallel way every single replication cycle.

This paper leads with some very specific claims, is based purely on simulation, and fails to ask obvious control questions based on it's own methods.

[1] https://doi.org/10.1371/journal.ppat.1000896 [2] https://doi.org/10.1371/journal.ppat.0030005

2 comments

But the thing they're evaluating does not have an evolutionary benefit, so the fact that many possible mutated viruses exist does not mean it's any more likely that this specific pattern would be observed in a specific pandemic virus.
But the actual question is not how likely it is of those naturally occuring mutations (which also happen to be exactly like the commercial ones used for cutting genomes) to occur at the exact positions you would expect them to be in engineered genomes.

The real question is how probable it is if such a naturally mutated virus actually develops into a pandemic. First it has to spawn from nature and then it also has to be lucky enough to cause a pandemic.

You have to multiply those two probabilities as well. And only then you have a model that can be compared to what happened with the SARS-CoV-2, is it still that likely to be a natural spill over?