| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bzbz 916 days ago

In your example, the amino acids order is sufficient to directly model the result: the sequence of amino acids can directly generate the protein, which is either valid or invalid. All variables are provided within the data.

In the original example, we are testing weather using the previous day’s weather. We may be able to model using whatever correlation exists between the data. This is not the same as accurately predicting results, if the real-world weather function is determined by the weather of surrounding locations, time of year, and moon phase. If our model does not have this data, and it is essential to model the result, how can you accurately model?

In other words: “Garbage in, garbage out”. Good luck modeling an n-th degree polynomial function, given a fraction of the variables to train on.

2 comments

famouswaffles 916 days ago

>All variables are provided within the data.

electrostatic protein interaction, hydrophobic interaction, organic chemistry etc

all variables are in fact not provided within the data. Protein creation is not just _poof_ proteins. There are steps, interactions and processes. You don't need to supply any of that to get a model accurately predicting proteins. That is the main point here, not that you can predict anything with any data.

link

jakderrida 915 days ago

> This is not the same as accurately predicting results, if the real-world weather function is determined by the weather of surrounding locations, time of year, and moon phase.

How many have the "human intelligence" to do this? Especially more accurately than a computer (and without using any themselves) training on the same inputs and outputs?

link