| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by oasisbob 1372 days ago

Yes, I've reviewed the supplemental materials.

> because nothing else excludes an earlier (even September) first introduction into humans. With an earlier introduction and thus more extensive unsampled spread, it's much harder to insist that A and B would be first sampled in the same order in which they evolved in humans

The tMRCA clearly excludes an earlier introduction. Because the tMRCA is based on genetic diversity, you cannot calculate a tMRCA based on all the known samples, get a date, and then say "oh, geez- well, there was also wide cryptic spread before that." It just doesn't make sense. Pekar addresses this point directly.

A race between the first A and the first B is a strawman. Rather, it's the predominance of lineage B over A in the early pandemic which is interesting. It would be unexpected for lineage B to dominate if A came first. Much of the modeling is to get a handle on how unlikely that situation would be. It shouldn't be surprising that the models don't support it as being likely. (But, that's not the only evidence.)

If you're willing to actually think about and engage on the phylogeny - stop with the "just a few SNPs" nonsense, and ask yourself what you really think the early origins looked like. If it really was a single introduction - Was lineage A ancestral? Was B ancestral? A C/C ancestor? A T/T ancestor? All these have interesting problems being supported by the data.

Finally, after reading some of your earlier comments, I'm realizing that you're conflating several techniques from Pekar's paper, eg:

> Have you looked at Pekar's full model, as set out mostly in the supplementary materials? This isn't any standard molecular clock approach. It's a byzantine stack of plausible but somewhat arbitrary assumptions, ending in a simulated phylogenetic tree.

His epi simulations are separate from the tree-building, with the possible exception of rooting, which he was using the output of the models to inform. Otherwise, the epi modeling which everyone is hand wringing over is really separate and doesn't end "in a simulated phylogenetic tree."

There /are/ novel methods used in the tree building (eg, non-reversibility of base substitutions), but that's a whole separate technique.

> Essentially Pekar's argument is a "two introductions of the gaps"--that if their model of a single introduction doesn't conform to reality, then it must have been two introductions.

BS. Again - understanding the paradoxes and debate involved in rooting the tree is basically required to understand the importance of this paper. The existing data is confounding and didn't conform to a logical understanding of viral evolution. A separate introduction elegantly explains the existing evidence.

If their modeling isn't strong enough evidence for you, fine. But that's different than throwing everything out because you don't understand how "just a couple SNPs" can still provide sufficient resolution to make phylogenetic inferences possible. If you think that "just a couple SNPs" /don't/ provide enough for experts in the field to inform their phylogenies, at least get to that argument directly instead of throwing ignorant shade at an unrelated portion of the paper.

Thanks for the links to those other threads. Nod's was interesting, but AFAICT, way off-base, starting around "Needless to say, early winter in Wuhan is not the Mardi Gras."

Here's Pekar's earlier thread which I recently reread and found helpful for understanding the significance of the phylogeny (#20 is where he gets into how lineage A breaks the clock):

https://twitter.com/jepekar/status/1499840335349911553

and Worobey re-emphasizing that we're not just talking about a few SNPs, it's the shape of the tree which matters:

https://twitter.com/michaelworobey/status/157050467474223923...

1 comments

tripletao 1371 days ago

I think you're talking about their model in "Inferring the MRCA of SARS-CoV-2", and I'm talking about their model in "Separate introductions of lineages A and B"? So you're saying they don't use the epi simulations to root and build the phylogenetic tree of real sampled genomes, which is true. I'm saying they do use the epi simulations to build a phylogenetic tree for each simulated pandemic, whose shape (polytomy structure) they then compare against the real tree:

> We simulated SARS-CoV-2–like epidemics (22, 23) with a doubling time of 3.47 days [95% highest density interval (HDI) across simulations, 1.35 to 5.44] (24–26) to account for the rapid spread of SARS-CoV-2 before it was identified as the etiological agent of COVID-19 (figs. S21 and S22, tables S3 and S4, and supplementary text). We then simulated coalescent processes and viral genome evolution across these epidemics to determine how frequently we recapitulated the observed SARS-CoV-2 phylogeny.

Coverage of this paper in the popular press usually said something like "study finds that SARS-CoV-2 arose from two introductions into humans", so I thought the latter was the more important result and started there. Like in your second link, Worobey says:

> [...] We then go on the explain, point by point, that it is not a two-mutation difference that is unexpected. It is a two mutation difference between two large clades like lineage A and lineage B, each displaying a MASSIVE polytomy at their root. This is something that [sic] DO NOT see in ~99.5% of simulations. That is the crux of the paper. Not the idea that two mutations can't happen in a single transmission event.

Are those "simulations" not the SIR-type epi simulations (followed by simulation of the mutations and sampling, then construction of the tree)? I believe his 99.5% is 100% minus the 0.5% from Figure 2C.

Their former model is of course independent of their SIR stuff, and indeed purports to independently establish tMRCA in humans too recent for significant cryptic spread. It carries a different set of plausible but arbitrary assumptions though, again about the stochasticity/overdispersion and sampling rate of early spread, just less directly.

link

oasisbob 1371 days ago

Glad we're on the same page about the multiple techniques now. Statements you made like, "Pekar et al. do some complicated phylogenetic modeling that purports to show the MRCA in humans is too recent" and "This isn't any standard molecular clock approach. It's a byzantine stack of plausible but somewhat arbitrary assumptions" made it clear there was confusion before. Their tree is based off a couple novel modification to established techniques. Your characterizations were inaccurate and laughable.

> It carries a different set of plausible but arbitrary assumptions though, again about the stochasticity/overdispersion and sampling rate of early spread, just less directly.

So, you don't only have problems with the modeling of the authors, but their base phylogeny too? Do you reject their tMRCA? Good grief.

I'm still looking forward to discussing the molecular phylogenetics of this paper sometime.

link

tripletao 1371 days ago

On reflection, I believe the first of my statements that you've quoted was indeed incorrect, and that I was also incorrect when I just wrote:

> Their former model [...] purports to independently establish tMRCA in humans too recent for significant cryptic spread.

Even if SARS-CoV-2 really entered humans in December, with minimal cryptic spread, that's still enough time for the two lineages to evolve in humans, since they're (sorry) just two SNPs apart. I believe Worobey knows this, and that's the reason why he emphasizes the "Separate introductions" model, since their polytomy thing--and not any question of time for cryptic spread--is their best and only argument to exclude that. So I was wrong to mention the tMRCA at all, since even perfect knowledge of that wouldn't tell us confidently how the two lineages arose.

The second of my statements seems correct to me. Not only is their argument for two introductions not a standard molecular clock approach, but it's not a molecular clock approach at all, since "Inferring" provides no support. Their only support comes from the polytomy thing in "Separate". This makes the accuracy of their epidemiological simulation highly relevant, thus the "hand-wringing" over that.

I'd note that you yourself referred me to "Separate", back in:

https://news.ycombinator.com/item?id=32258096

So why did you switch to "Inferring"? I guess we could discuss that too, but per above I don't believe that could provide significant support for two introductions into humans, and thus not for natural vs. research-related origin. Do you believe otherwise? Or do you just mean the approach is of general interest, independently of that question of origin?

link

oasisbob 1370 days ago

> Not only is their argument for two introductions not a standard molecular clock approach, but it's not a molecular clock approach at all, since "Inferring" provides no support

Okay, lets revisit this now that some of the terminology confusion is recognized.

"Inferring the MRCA of SARS-CoV-2" introduces their phylogenies. It was produced with BEAST as described in their methods. I believe this is the model you were referring to as "Inferring." Yes?

I don't understand what you're trying to say here. If you don't understand how their phylogeny helps support their theory of multiple introductions, I don't know what to tell you. Maybe just another clarification of what you're trying to say would help.

> I'd note that you yourself referred me to "Separate", back in ... So why did you switch to "Inferring"

Because we're discussing multiple things in the same paper?

link

oasisbob 1370 days ago

> Even if SARS-CoV-2 really entered humans in December, with minimal cryptic spread, that's still enough time for the two lineages to evolve in humans, since they're (sorry) just two SNPs apart.

This isn't the evidence the authors present. The argument isn't "there isn't enough time to go from A -> B." IIRC, I've seen similar acknowledgements that even more rare mutations have been observed in a single transmission during the course of the pandemic. They're just highly improbable.

The most direct evidence (as I see it) for B not evolving from A in humans is the unexpected lack of genetic divergence in lineage A compared to B. Lineage B should show a younger molecular clock, it doesn't.

> I believe Worobey knows this, and that's the reason why he emphasizes the "Separate introductions" model, since their polytomy thing--and not any question of time for cryptic spread--is their best and only argument to exclude that. So I was wrong to mention the tMRCA at all, since even perfect knowledge of that wouldn't tell us confidently how the two lineages arose.

Nonsense. The tMRCA is key evidence in how the lineages arose. One of the reasons for the epi modeling was to figure out the plausible time between the primary case and index case. It shows there is at most a few dozen people infected before the genetic diversity was captured through sampling. (`Results: Minimal cryptic circulation of SARS`)

I don't think you understand their argument here, at all.

> Not only is their argument for two introductions not a standard molecular clock approach, but it's not a molecular clock approach at all, since "Inferring" provides no support

> So why did you switch to "Inferring"?

I don't understand why you're bristling and reading into the terminology here. https://plato.stanford.edu/entries/phylogenetic-inference/

Please elaborate why you think their use of the molecular clock is novel. It's really not.

> Do you believe otherwise? Or do you just mean the approach is of general interest, independently of that question of origin?

As explained above, I think the authors provide compelling evidence of multiple introductions using solid phylogenetic inference and solid molecular epidemiology. Bottom line is that there simply isn't an alternate hypothesis which explains the available evidence, and they illustrate why.

Here's a video you might not have seen, with Pekar and Wertheim. I've cued up the portion with a great explanation of why the evidence in the MRCA and genomics is so important. If you're going to continue to try and tear down their arguments, you probably want to really get this part.

https://www.youtube.com/watch?v=TYqJCdqdkio&t=3330 (especially 1h12m45, and 1h19m)

link

tripletao 1370 days ago

I think I understand what Worobey and Pekar write on Twitter, though I disagree with much of it. I don't understand what you're saying, so I'm afraid we're still talking past each other.

Do you agree that there are two mostly-independent models in the paper, one described in the section titled "Inferring the MRCA of SARS-CoV-2", and another in the section titled "Separate introductions of lineages A and B"? When I write "Inferring" and "Separate", I am referring to the models described in the sections with titles beginning with those respective words.

You wrote earlier:

> His epi simulations are separate from the tree-building, with the possible exception of rooting, which he was using the output of the models to inform. Otherwise, the epi modeling which everyone is hand wringing over is really separate and doesn't end "in a simulated phylogenetic tree."

As to "Separate", I believe that's incorrect. That model begins with an SIR-type simulation, and outputs the shape (polytomy structure) of the phylogenetic tree of that simulated pandemic, which they compare against the shape of the real pandemic's phylogenetic tree. Do you disagree? If so, what do you believe is the output of that "Separate" model?

I agree that the "Inferring" model does not depend on the epidemic simulation. I don't believe the "Inferring" model provides significant support for two introductions though. I believe that's the reason why most public debate has been about "Separate".

link

oasisbob 1370 days ago

Yeah, I think we're basically on the same page with their methodology and models now.

I didn't realize you were nicknaming the models based on applying them to the result titles, so was quite confused, especially when we both used those words in the quoted sections, so it sounded like you were referring to portions of our conversation. So yeah, talking right past each other.

No, the two models don't correspond to the results cleanly. ie, when the authors claim "Separate introductions of lineages A and B" in the results, they provide evidence from both. (They're presenting the results of the models in support of their phylogeny.) I agree that "Inferring the MRCA of SARS-CoV-2" is pretty much independent of the epi stuff.

> As to "Separate", I believe that's incorrect. That model begins with an SIR-type simulation, and outputs the shape (polytomy structure) of the phylogenetic tree of that simulated pandemic, which they compare against the shape of the real pandemic's phylogenetic tree. Do you disagree? If so, what do you believe is the output of that "Separate" model?

I thought we were over this. We both agree that one of the results of the epi simulations was sampled genetics and a resulting tree from the simulation. That doesn't mean that their phylogeny is the direct result of their epi simulations. Their simulations are in support of their phylogeny. Their theorized phylogeny essentially existed prior to the modeling, and which is why I called them separate, ie, independent.

The `Materials and methods summary` is quite clear, especially `Phylodynamic inference and epidemic simulations`.

edit: Our thread is too deep for HN, might not be able to reply? I'll try and keep an eye for new replies if you want to fork off somewhere else.

But, where's your horse in this race? You speak a lot about what you think sucks and very little about what you actually believe here.

> I agree that the "Inferring" model does not depend on the epidemic simulation. I don't believe the "Inferring" model provides significant support for two introductions though. I believe that's the reason why most public debate has been about "Separate".

Funny. My theory is that most people don't have enough knowledge of molecular genetics to make heads or tails of the paper, and so are of course silent on those results. They didn't follow the debate over the past few years, and are showing up and trying to understand something without context or the requisite knowledge.

When you say "Public debate" you need to admit you're talking about a particular part of a particular website or two where a small number of people are picking at nits and can't even address the core of the findings the authors present here.

link