Hacker News new | ask | show | jobs
by ausername42027 1620 days ago
This is a semi-interesting piece (because of how poorly written it is and how hilariously confident the writer is), but it is embarrassing that the person writing claims to have a PhD in at least something semi-related to molecular biology, yet is trying to convince people that we can use independent probability when dealing with protein amino acid sequences. Certain sequences are seen time and time again in nature, just like certain traits visible to the naked eye have evolved independently. More importantly, this author dismisses natural mixing of SARS and HIV1 as "ridiculous" and provides no other explanation. It is possible that SARS-CoV-2 was created in a lab, but it is also possible that HIV-1 and SARS-CoV-2 co-infected a host.

Edit: to elaborate, certain amino acid sequences are seen again and again, and others are practically impossible to see in nature because amino-acids fold into 3D proteins...and each amino acid is like a Lego piece. There is a lot more detail needed to understand fully how tertiary and 4try protein structures form...but it is also possible to understand it decently well after an undergraduate degree in biology, chemistry, biochemistry, etc. A PhD in the field knows this inside and out. If I had to guess, the author of this post has a PhD in something like statistics or computer science, and thinks that they can apply high school level math to two fields (molecular biology and biochemistry) that they do not even have a high school level of understanding of.

To make a very stretched analogy (I am a doctoral student in the life sciences and a hobby programmer), this blog post is like saying that some Java source code is stolen because four of the class names are the same between two projects, and then using the number of possible characters in the class names and the length of the name to run some statistics. The problem being that no one has class names like "AeNOQ92bA"...in fact the vast majority of 8 character sequences will never be class names. Just like the vast majority of amino acid sequences (likely) do not exist in nature. And then you dig deeper and find out the class name is something like "MainDashboardSupervisorTree" (I do not know Java so forgive me). Then you also find out that the author of both classes...is the same guy who moved companies and likes particular naming convection, but never meant to copy stuff word for word. Similar to how SARS-CoV-2 could have naturally incorporated HIV-1 RNA into its genome when co-infecting a host.

3 comments

the HIV-1 protein sequence is the same but not the DNA sequence. The moderna DNA sequence is identical. The odds of HIV-1 mixing with bat coronavirus, and then somehow finding the moderna DNA sequence, while not astronomically low, are pretty low.
How are you calculating those odds? And then once you have calculated them, what are you using to determine your threshold for "low" odds? The odds that a bunch of atoms ended up turning into you and me and the rest of humanity are extremely low or extremely high depending on what your criteria for "low" odds are.

Given the number of bats in the world, the number of human hosts, and probably most importantly, the number of individual virions within each infected host (and therefore the number of replication cycles) is...astronomically high. There are somewhere between 1 and 100 BILLION virions of COVID in each infected person. Now imagine a few thousand bats infected...we are already talking about maybe 1 QUADRILLION different virions (1,000 trillion). It only takes one virion to incorporate some very handy and fitness-increasing HIV-1 RNA into its genome and it is off to the races.

> How are you calculating those odds?

I truly have no idea what the answer is, but as someone well versed in economics, econometrics,and statistics, to me the relevant odds are not about independent coin tosses etc. I would like to know P(A|B) where:

A: sequence of 30 nucleotides appears in a virus identified in nature

B: sequence appears in a patent application that predates discovery in nature

Seems to me that would involve a whole bunch of arithmetic, but that ought to be calculable using this database.

the mutation rate simply is not as high as you imagine it to be. If it were, then we would (likely) see more wobbling around the wobble pairs in the cleavage site coding region.
I never mentioned mutation rate so your gotcha is not as crafty as you might think. Also a mutation in the sense you seem to be implying (random base pair changes due to lack of proof reading) is not really what I am talking about. I am talking about HIV-1 genome being incorporated into SARS-CoV-2 by the host cell or either virus during replication. That is not that crazy when you have quadrillion of replication cycles and a selective pressure to mutate and incorporate new RNA.
we're talking about the odds of the founder event. My guess is odds are about 1 in 3^{5-6} (3-ish wobble options, five to six wobble sites, haven't looked at the sequence to confirm this. Probably someone can do a better analysis based on codon usage in humans. I suggest doing it as an exercise in understanding molecular biology.
Besides an enormous number of people now infected with both HIV and COVID-19. That alone increases the chances of this happening naturally enormously.
my favorite part about your response here is how you didn't answer any of the questions and gave a hand-wavey response about how they're wrong.
alright, do you want a specific answer? My guess is it's somewhere around one in 3^{5-6}-ish, for the singular founder event that establishes and fixes that sequence as the canonical sequence for the furin cleavage site of COVID-19.
Well sure, but given that viruses mutate on the order of millions of times a day (since they replicate on the order of gazillions of times a day) it's not terribly unreasonable that this sequence could have developed by accident, even through the relatively winding path you described.

Hell, the entire genetic code of every variant of COVID-19 exists somewhere in the digits of Pi, but that doesn't mean that mathematicians created COVID. This reeks of a slightly more advanced form of numerology to me.

While viruses do mutate a lot, if the sequence were that labile, then you would expect there to be a lot of divergence around the furin cleavage site. We don't really see that, so the site is stable. So, is there something special about that DNA sequence (including the synonymous wobble pairs) that make this a random walk gradient descent minimum? Or is the mutation rate lower than you think.
All I'm saying is when people say "this mutation is super rare therefore it MUST have been manmade!", I'm skeptical. Even the most stable DNA sequences have mutations occasionally, because mutations happen for a bunch of totally uncorrelated reasons. It could have been manmade, but it could have also been made by a stray cosmic ray. To say a particular sequence "proves" that the virus is manmade is...sketchy at best.
Nobody is claiming proof. Don't move the goalposts.
This article is....suggestive, to say the least. So I'm at least responding to the article.
The mutation rate for proofreading ssRNA viruses is, affair, about 10^-7 per nt per replication. If we don’t see a particular variant nt it is (from the top of my head) because it didn’t “survive” the drift events or that there is strong positive selection on the sequence.

(Preemptive) Synonymous mutations are able to be selected, as they affect speed of translation and protein folding due to stalled ribosomes.

The person who wrote the article is pseudonymous, so their credentials cannot be examined or questioned.
They are presenting verifiable evidence are they not? Why do their credentials matter outside of ad hominem attack?
Because they clearly are not qualified in the field based on the quality of the analysis, yet claim as if they have to hide even as a "PhD". Their blog post reads like someone who works on the backend for BLAST but knows nothing about genomes or proteins.
So, "their credentials cannot be examined or questioned" yet "they clearly aren't qualified in the field based on the quality of the analysis"?

Why, according to you are they worth and not worth examining?

The evidence, and more crucially, its meaning and significance are not "verifiable" by 99.999% of the population. That's why things like credentials, resumes, publication history, and peer review exist. That's why we care if someone has promulgated 50 hoaxes in the past, for instance.

So yeah. Credentials do matter, and it matters that this person is not willing to say this out loud, in public, without using a pseudonym.

Alright. I can verify the meaning and significance. I have a PhD in chemical biology, and have done (non-pathogenic) gain of function research. This is the sort of result that I would expect someone to have found if I were given the charge to insert a furin cleavage site into a virus. Proof: https://bmcbiochem.biomedcentral.com/articles/10.1186/1471-2... ,in which I stack-overflow-copy-paste ideas for beneficial mutations from sequences in distant species.
No, you are wrong and you don't know how peer reviewing works.

Reputable journals and other publication outlets use something called "Double Blind peer review" precisely to prevent that the reputation of a researcher could skew the peer review process.

If you want to review and cast a judgement for the points presented in the article, you should do it only by refuting or confirming the content of the text itself. Not because it was written by Einstein or Donald Trump.

> Reputable journals and other publication outlets use something called "Double Blind peer review" precisely to prevent that the reputation of a researcher could skew the peer review process.

Double blind peer review is INCREDIBLY UNCOMMON among high impact biomedical journals

Some journals and conferences use double blinding. Not all do. In some, the reviewers can see the names of the authors. In others, the authors can propose reviewers, or ask the someone is excluded from review. Journals and conferences may change their rules from one year to the next.

Anyway the blinding, when it exists, is as much to protect the reviewers from retribution by asshole authors, as it is to avoid biasing the reviewers by the reputation of the authors.

First, I do know how peer review works.

Second, you're missing the key difference, here: peer review is for review and evaluation within the community of qualified scientists.

What I'm talking about is something else: the ability of the reasonably well-informed public to evaluate claims, even though they lack domain expertise.

I also think the article is wrong, but I disagree with both your points (as scientist myself). Verifiable evidence does not need to be verifiable by some threshold percentage of the population. If that was the case, most math PhD thesis would not be verifiable. Also, writing under a pseudonym is quite understandable given the polemic nature of the topic. If anything, I think we should have more people writing under pseudonyms than less. We could end up with a lot more interesting ideas circulating. The peer review itself must of course be done by trustworthy third parties, but the source of the text need not be.
The credentials don't matter to experts in the field, but this forum is filled with laypeople. No layperson should accept such an article without expert vetting.
No. People can evaluate the evidence for themselves. We do not need authority figures to tell us what to think. This is particularly true when it comes to assessing probabilities, an activity at which most experts have proven hilariously incompetent.

(Musing: What is the probability that a genetics expert's opinion on the probabilities at play would make the discussion less well-informed rather than more well-informed?)

> So yeah. Credentials do matter, and it matters that this person is not willing to say this out loud, in public, without using a pseudonym.

The problem is that we've come so far that him not saying this out loud is not necessarily hinting to him being a charlatan. Plausibly, the cost/benefit analysis by that individum might not make it worth speaking the truth given how heated up the matter is rn. If this sentiment propagates to other topics I fear that larger (and important) parts of the world might drift towards soviet era incompetence

If you were a peer reviewer for a reputable journal (double blind peer reviewing process), you would not need neither the name nor the credentials of the person submitting an article. Actually, scientists and editors would laugh at you for asking for that.
An additionnal improbability is, from all the millions of patents, why one from MODERNA ?