Hacker News new | ask | show | jobs
by zack-m 1177 days ago
Corresponding paper to RFdiffusion: https://www.biorxiv.org/content/10.1101/2022.12.09.519842v2

Some context: Been waiting for this to come out for a while! Main innovation is leveraging RosettaFold (protein folding neural net) to generate protein backbones via diffusing in 3D space! From backbones, we can generate sequences that would fold into said structures via sequence design algorithms (check out: proteinMPNN, Rosetta FastDesign).

In terms of applications: This is super relevant for our ability to create strongly binding protein binders (ex timely creation of proteins that bind to virus spike proteins), and designing enzyme from scratch!

Prior methods suffered from much lower success rates for generating “good” backbone structures. Extremely exciting!! If you want to learn more, check out the Baker group at UW!

1 comments

So in essence, if I understand correctly, instead of generating Balenciaga Pope or arrested Trump fake images, we can now dream up fake protein things which may actually be viable for whatever purpose if synthesized in the real world?
Dreaming up a static three dinensional structure does not guarantee that it is stable in a given environment, or that production of this structure in a lab is viable. A huge problem in the space is protein folding–concerned with figuring out how you get from an unfolded linear string of amino acids to this three dimensional structure.

Folding takes into account many variables, and a big chunk of current experimental structure determination is concerned with controlling/adjusting these variables.

So this dreaming up will provide a potential “quicker way” into what a folded protein might look like, but it will not guarantee you that humanity knows how to actually produce it in the real-world.

Disclaimer: someone correct me if I’m wrong. I might be rusty on the latest developments, as I’ve left the field after my PhD.

Indeed there are many pitfalls between a protein sequence and something useful to humanity, but there is reason to believe the technique is capable of generating such proteins:

1) In the paper they express several of their designs and show stability via circular dichroism experiments. They also show size exclusion chromatography results indicating some of the proteins are of the expected size and are not aggregating.

2) Since RFDiffusion and ProteinMPNN, which generates the actually amino acid sequence, are trained using Protein Data Bank (PDB) data, it's reasonable to presume the predicted proteins will be well behaved. To solve a protein structure via say X-ray crystallography, EM, or NMR and deposit it into the PDB requires bucket loads of stable protein. I used several grams of recombinant protein for a X-ray structure I solved. Since the ML models are trained on well behaved proteins, I can believe the generated proteins will also be well behaved.