Hacker News new | ask | show | jobs
by etaoins 1611 days ago
Using the number of BLAST hits as the basis of an argument is about as reliable as using the number of results when searching for a string on GitHub. Without further analysis of the specific sequence and its biological context it can be highly misleading. See a previous Twitter thread on some other purported HIV inserts: https://twitter.com/trvrb/status/1223666856923291648

There also seems to be some circular reasoning the argument. Apparently we can ignore RaTG13 because it’s obviously synthetic, which makes SARS-CoV-2 look even more synthetic. It would be interesting to compare to the BANAL family of SARS-CoV-2-related viruses that are even more closely related to SARS-CoV-2 than RaTG13 [1].

I’m not sure why only viral genomes were searched for the furin cleavage site sequence. Viruses famously exchange genetic material with their host organisms. The “smoking gun” sequence also appears Mycobacterium smegmatis, for example [2].

[1]: https://www.nature.com/articles/d41586-021-02596-2 [2]: https://twitter.com/soychicka/status/1243547603746410500

1 comments

The GitHub example actually is pretty spot on. If you wrote a non trivial piece of code in 2016, then in 2020 it was used verbatim in another program, what would your conclusion be?
For a non-trivial piece of code, the conclusion might be that it was copied (horizontal code transfer??). But the 6 amino acids is not a "non-trivial piece of code" -- it is a trivial short string of letters. So the more relevant question is, how often do we see "dogcat" in GitHub (468 repository names). Or perhaps something more nonsensical: "ifdlog" (6 letters from "goldfish" reversed): 1 repository and 25 code results. "hsifdl" - 1 code result.

Random things occur at random.

Sure I will not argue that the code showing up somewhere is statistically interesting. What’s interesting is that it showed up exactly once and in a Moderna patent. What’s the probability of that? That of the random strings in the Moderna patent that it would be an exact match to a string in the COVID genome but nowhere else?