Hacker News new | ask | show | jobs
by oasisbob 1368 days ago
I think it might be productive to dive in on this part

> Loosely, that's the observation that (stochasticity of spread aside) we'd expect the earlier lineage A to have more and more diverse descendants than the later lineage B. Their epi model in "Separate" is a formalization of that, and if they could correctly and confidently model that spread then I believe it would be sound.

Yeah, that's the observation. However, you're invoking the epi model at the wrong time. If you read `Inferring the MRCA...`, all of this is already known and observed before the modeling is even run. The epi model doesn't contain these results. They constructed their SC2 tree, then brought it over to the epi model to play with it.

If you want a "formalization" of that observation, perhaps Table I will do.

The results are best read in order.

If you're trying to better understand the phylodynamic model, perhaps "Inference of Viral Evolutionary Rates from Molecular Sequences" by Drummond would be interesting.

1 comments

I think you're failing to appreciate the reason why they built the "Separate" model. Their headline claim is that SARS-CoV-2 arose from two zoonotic introductions into humans. If you want to express that claim in terms of the real pandemic's tree, then the relevant tree is the tree in humans only, which would then have two roots.

The construction of such a tree inherently depends on our assumptions on the epi dynamics. For example, if you give me a hundred genomes and I propose a hundred roots, then that wouldn't usually be a very good tree; but if the disease in question were known to spread animal-to-human but not human-to-human, then that might be correct. Nothing in their "Inferring" model allows them to incorporate such obviously relevant information, so that seems like an obvious deficiency.

To put it another way, you write:

> If you read `Inferring the MRCA...`, all of this is already known and observed before the modeling is even run.

After "Inferring", I believe they know the real tree has structure that's obviously non-modal (i.e., not the most likely outcome) given any single introduction. I don't see how they'd know whether it's a p = 20% non-modal or p = 0.5% non-modal outcome without an epi model like "Separate", or some kind of ugly incorporation of the epi dynamics into BEAST that they wisely didn't attempt.

I believe that's why the authors built "Separate", and its basic form is good work. (If you don't, then why do you think they spent their time on that?) I just disagree with their parameter choices and excessive confidence in their result.

As to your other reply, I agree the 10% is a rough number, not considering mutation biases and such. That's just the probability in a single transmission though, and it's also possible (and more likely) that the two lineages formed in humans with intermediate lineages that went extinct before they could be sampled. I think we at least agree that timing alone is insufficient to exclude evolution of the two lineages in humans though, even assuming a December introduction? I'm just trying to confirm that none of the evidence you see for two introductions in "Inferring" comes from its tMRCA.

ploink

Enjoy your sealioning.

Sorry; maybe I'm too stupid or lazy, but I genuinely don't get your point. Is it just that when they construct the tree in "Inferring", it looks qualitatively surprising (non-modal) given any single introduction, assuming (as I do as well) that A predates B? But we've known that for literally years now. As I understand the paper, their novel contribution is to quantify how surprising that looks, whether it's p ~ 20% surprising (which wouldn't mean much) or their claimed p ~ 0.5%. That's what they do in "Separate", and it correctly and inherently depends on the epidemiological modeling that I don't trust.

Again, in the Twitter thread that you yourself linked, Worobey says:

> This [the real polytomy structure] is something that [we] DO NOT see in ~99.5% of simulations. That is the crux of the paper.

The simulations in question are the epidemiological simulations from "Separate". You've told me to disregard Worobey's comments here; but while it's possible that Worobey has misunderstood the significance of his own paper, it seems more likely to me that you have.