|
|
|
|
|
by davecap1
912 days ago
|
|
There's a lot more to protein sequences than legos. I think the argument is that you don't need to train a model on fundamental organic chemistry/biochemistry, electrostatic protein interaction, hydrogen bonding, hydrophobic interaction, quantum mechanics, etc... in order for it to accurately predict protein sequences. |
|
More generally, AI models (aka very large function graphs) are trained on tuples that represent mappings of inputs to outputs (input -> output). The idea then is that whatever structure exists in those pairs/tuples/mappings is discovered by the training process with the help of gradient descent which tunes the parameters of the model/graph to optimally compress the information contained in the data. This means the model must uncover the quantum effects (or some close proxy of it) and then encode them into the parameters in a way that makes compression/prediction possible [1].
None of this is magic, compressing data requires uncovering structures and symmetries that can be used to reduce the size of the data and it turns out gradient descent with lots of parameters manages to do that for a large class of problems albeit at a very steep computational cost that requires billions of dollars for hardware and software (including nuclear power plants [2]). We are not going to get AGI with this approach but fortunately I know how to make it happen for a mere $80B.
1: https://arxiv.org/abs/2305.15614
2: https://www.cnbc.com/2023/09/25/microsoft-is-hiring-a-nuclea...