Hacker News new | ask | show | jobs
by biophysboy 1177 days ago
I do viral bioinformatics for my job. Bioinformatics workflows analyze raw data to assemble sequences, create phylogenetic trees, etc. They can't just design a completely novel RNA sequence (this is not the same as de novo assembly). Scientists can definitely manipulate pre-existing genomes, synthesize the edited genome, and thereby synthesize viruses, but this involves a lot of trial-and-error, tedious wet lab work. Also, the research on making more dangerous viruses through manipulation is extremely controversial and regulated, so its not like there is a wealth of scientific papers/experiments/data that a natural language model could just suck up.

Also, I asked GPT to do some of these things you suggested and it said no. It won't even write a scientific paper.

2 comments

I think you misunderstood my initial comment, the point I was trying to make is that it's the amplification of the abilities of bad actors that should be of concern, not AI going rogue and deciding to exterminate the human race.

If one were to actually try to do such a thing you wouldn't need a LLM. For a very crude pipeline, you would need a good sequence to structure method such as Alphafold 2 (or maybe you can use a homology model), some thermodynamically rigorous protein-protein binding affinity prediction method (this is the hardest part) and an RL process like a policy gradient with an action space over possible single point sequence mutations in the for-example spike protein of SARS to maximize binding affinity (or potentially minimize immunogenicity, but that's far harder).

But I digress, the technology isn't there yet, neither for an LLM to write that sort of code or the in-silico methods of modeling aspects of the viral genome. But we should consider one day it may be and that it could result in the amplification of the abilities of a single bad actor or enable altogether what was not possible before due to a lack of technology.

I probably misunderstood the details of where you think AI will accelerate things. You are worried about AI predicting things like protein structure, binding affinity, and immunogenicity. And using that info to do RL and find a sequence, basically doing evolution in silico. Is this a better representation? That it reduces the search space, requiring less real experiments?

I am basically just skeptical these kinda of reductive predictions will eliminate all of the rate limiting steps of synthetic virology. The assumptions of the natural language input are numerous and would need to be tested in a real lab.

Also, we can already do serial passaging where we just manipulate the organism/environment interaction to make a virus more dangerous. We dont need AI; evolution can do all the hard stuff for you.

It’s been blinded. Other actors will train AIs without such blindness. That’s obvious, but what is more nefarious is that the public does not know exactly which subjects GPT has been blinded to, which have been tampered with for ideological or business reasons, and which have been left alone. This is the area that I think demands regulation.
Definitely agree the blinding should not be left to OpenAI. Even if it weren't blinded, it would not significantly speed up the production of dangerous synthetic viruses. I don't think that will change no matter how much data is put into the current NLM design