|
|
|
|
|
by borroka
1627 days ago
|
|
But the OP was asking something different, that is why someone should excessively focus on theory, when, by the way, DL theory is very far from being solid and trial and error in ML and AI is the common way of operating. The "model is in place, but I have no clue what's doing and so it can fail without me understanding when and how is straw-man". Especially for supervised learning, that is, we have a label for data, it is immediately clear whether the output of the model is "bunk, useless, or even harmful". There is no "fail silently by design". I have been working in the field for almost 20 years in academia and in industry and it is not that I starting every PCA thinking about eigenvectors and eigenvalues and if you ask me now without preparing what are those, I would be between approximately right and wrong. But I fit many, many very accurate models. |
|
For instance, if we put an MSE loss function on a classification NN with sigmoid outputs, and used a classification dataset, we could generate an entire zoo of "many, many very accurate models" as measured by MSE. But once your model returns outputs, how do you interpret them to predict a label for some input data? You could hack some algorithm together (eg argmax of the highest value) which is indistinguishable from the "correct" procedure but the described probabilities are so incorrect that no ML professional would be comfortable trusting anything it says, not least because of the violation of the condition that the probabilities are non-negative and sum to one. But being able to explain why we use MSE or cross-entropy or any other loss function and which output activations (hint: and probability distributions) they are typically associated with actually has a very deep origin in the foundations of probability theory which blows open a whole new way of thinking about statistical modelling that is not made available in any of the programs whose materials I've been exposed to.