| Yeah, I think we're basically on the same page with their methodology and models now. I didn't realize you were nicknaming the models based on applying them to the result titles, so was quite confused, especially when we both used those words in the quoted sections, so it sounded like you were referring to portions of our conversation. So yeah, talking right past each other. No, the two models don't correspond to the results cleanly. ie, when the authors claim "Separate introductions of lineages A and B" in the results, they provide evidence from both. (They're presenting the results of the models in support of their phylogeny.) I agree that "Inferring the MRCA of SARS-CoV-2" is pretty much independent of the epi stuff. > As to "Separate", I believe that's incorrect. That model begins with an SIR-type simulation, and outputs the shape (polytomy structure) of the phylogenetic tree of that simulated pandemic, which they compare against the shape of the real pandemic's phylogenetic tree. Do you disagree? If so, what do you believe is the output of that "Separate" model? I thought we were over this. We both agree that one of the results of the epi simulations was sampled genetics and a resulting tree from the simulation. That doesn't mean that their phylogeny is the direct result of their epi simulations. Their simulations are in support of their phylogeny. Their theorized phylogeny essentially existed prior to the modeling, and which is why I called them separate, ie, independent. The `Materials and methods summary` is quite clear, especially `Phylodynamic inference and epidemic simulations`. edit: Our thread is too deep for HN, might not be able to reply? I'll try and keep an eye for new replies if you want to fork off somewhere else. But, where's your horse in this race? You speak a lot about what you think sucks and very little about what you actually believe here. > I agree that the "Inferring" model does not depend on the epidemic simulation. I don't believe the "Inferring" model provides significant support for two introductions though. I believe that's the reason why most public debate has been about "Separate". Funny. My theory is that most people don't have enough knowledge of molecular genetics to make heads or tails of the paper, and so are of course silent on those results. They didn't follow the debate over the past few years, and are showing up and trying to understand something without context or the requisite knowledge. When you say "Public debate" you need to admit you're talking about a particular part of a particular website or two where a small number of people are picking at nits and can't even address the core of the findings the authors present here. |
So I guess we were also talking past each other on "Separate". By "simulated phylogenetic tree", I've always meant "phylogenetic tree for one of their simulated pandemics", not a tree for the real pandemic. We also agree that Pekar's argument isn't based on the time necessary for the two lineages to evolve in humans, since at least that much difference could arise even (with p ~ 10%) in a single human-to-human transmission.
So to exclude evolution of the two lineages in humans, they needed something else. Loosely, that's the observation that (stochasticity of spread aside) we'd expect the earlier lineage A to have more and more diverse descendants than the later lineage B. Their epi model in "Separate" is a formalization of that, and if they could correctly and confidently model that spread then I believe it would be sound.
It seems like we disagree as to what forms the paper's core result, though. I'm taking my own cue from Worobey's Twitter comments, because (a) he's an author, so he presumably should know better than most, and (b) while I disagree with his conclusion, I do see the flow of his argument. In the thread that you linked and I quoted, he describes the result of that "Separate" model--which fundamentally depends on the epi stuff--as the crux of the paper. That makes sense to me.
I believe you prefer to think in terms of construction of the phylogenetic tree for the real pandemic, like to frame the question of number of introductions in terms of the number of roots for the tree. That's in a certain sense equivalent, but it seems much less intuitive to me. The "Separate" approach makes the epidemiological assumptions explicit. Those assumptions are obviously always relevant though, so they're still relevant when you frame the problem in terms of the real tree; they're just much harder to express in the parameters (R0, serial interval, dispersion parameter k, etc.) typically used to model a pandemic.
When they built the real tree, they observed that any single root fits badly. (Per your other comment, I agree that's what they did in "Inferring" with BEAST.) More roots would fit better; but that's always true for any phylogeny unless there's a penalty for each additional root, since more roots improves all the other usual measures of fit. Without quantifying what that penalty per additional root should be, it's not possible to say whether the poor fit is because the tree really should have two roots, or for other reasons (unmodeled stochasticity of spread, imperfect sampling, etc.). It's not too easy to convert those pandemic parameters into that penalty. So it makes sense to me that they didn't try, and instead switched to the SIR-type simulations in "Separate", which they're treating as their most important result.
As I've noted earlier, I don't believe it's possible to reach any confident conclusion (as to research-related vs. natural origin, the number of introductions into humans, or most of the other topics of major contention) from the evidence currently available. I'd have little objection to this paper if it were framed as exploratory work, whose speculative conclusions should not be trusted without further verification. That's not how Worobey and others have portrayed it in the popular media, though, and also not how you've initially portrayed it here.