| See for example the AlphaFold2 presentation linked here: https://predictioncenter.org/casp14/doc/presentations/2020_1.... Some samples that point out where most of the innovations are NOT just "huck a transformer at it": ==== Physical insights are built into the network structure, not just a process around it - End-to-end system directly producing a structure instead of inter-residue distances - Inductive biases reflect our knowledge of protein physics and geometry - The positions of residues in the sequence are de-emphasized - Instead residues that are close in the folded protein need to communicate - The network iteratively learns a graph of which residues are close, while reasoning
over this implicit graph as it is being built What went badly: - Manual work required to get a very high-quality Orf8 prediction - Genetics search works much better on full sequences than individual domains - Final relaxation required to remove stereochemical violations What went well - Building the full pipeline as a single end-to-end deep learning system - Building physical and geometric notions into the architecture instead of a search process - Models that predict their own accuracy can be used for model-ranking - Using model uncertainty as a signal to improve our methods (e.g. training new models to
eliminate problems with long chains) ==== Also you can read the papers, e.g. https://www.nature.com/articles/s41586-019-1923-7 (available if you search the title on Google Scholar; also https://www.nature.com/articles/s41586-021-03819-2_reference...). There is actual, real good science, physics, and engineering going on here, as compared to e.g. LLMs or computer vision models that are just trained on the internet, and where all the engineering is focused on managing finicky training and compute costs. AlphaFold requires all this and more. EDIT: Basically, the article makes it sound like deep models just allowed scientists to sidestep all the complicated physics and etc and just magically solve the problem, and while this is arguably somewhat correct for computer vision and much of NLP, this is the exact opposite of the truth for AlphaFold. |