|
|
|
|
|
by bzbz
916 days ago
|
|
In your example, the amino acids order is sufficient to directly model the result: the sequence of amino acids can directly generate the protein, which is either valid or invalid. All variables are provided within the data. In the original example, we are testing weather using the previous day’s weather. We may be able to model using whatever correlation exists between the data. This is not the same as accurately predicting results, if the real-world weather function is determined by the weather of surrounding locations, time of year, and moon phase. If our model does not have this data, and it is essential to model the result, how can you accurately model? In other words: “Garbage in, garbage out”. Good luck modeling an n-th degree polynomial function, given a fraction of the variables to train on. |
|
electrostatic protein interaction, hydrophobic interaction, organic chemistry etc
all variables are in fact not provided within the data. Protein creation is not just _poof_ proteins. There are steps, interactions and processes. You don't need to supply any of that to get a model accurately predicting proteins. That is the main point here, not that you can predict anything with any data.