|
|
|
|
|
by dreamlessfate
1347 days ago
|
|
My journey with Machine Learning so far: :D Oh, nonlinear equations! This is something I know a lot about. :) I think I see...so they use nonlinear equations in the activation function. This helps to create divergence, or sensitive dependence on initial conditions. :| Wait it's a sigmoid function?? Wtf that's boring. :( They're just trying to min/max a data set, and figure out probability as it relates to that min/max. But that sucks, because most of the interesting phenomenon in nature exists in BETWEEN zero and one! All the fun, cool stuff happens in the middle! You can't reduce it down to a probability, there's no way that's going to do a good job describing anything! |
|
https://iq.opengenus.org/relu-activation/
For many learning tasks ReLU performs better than sigmoid.
My favorite use of sigmoid functions is
https://scikit-learn.org/stable/modules/calibration.html
where they are very good at turning arbitrary scores (say from a full-text search engine) into probabilities. For IBM Watson they tried a lot of things and found logistic functions dominated. Turning scores into probabilities was how Watson could decide if and when hitting the button would help win the game.
The big trouble with probabilities is that, potentially, every event is contingent on every other event and the joint probability distribution of all possible inputs and outputs is a huge dimensional space. In principle you could learn any function by sampling the joint probability distribution exhaustively but practically you can't get that much data. The miracle of machine learning is that the methods we use can guess at the joint probability distribution of inputs and outputs with only a limited sample.
If there was one great unsolved problem of the "old AI" namely expert systems it was doing logical reasoning over probability functions. It's not good enough to estimate that A has a 80% probability of being true, in general you need to estimate what the probability of A is if B is true and C is false. If the problem cooperates you can use half-baked methods to reason about uncertainty (like the MYCIN medical diagnosis program) but general and correct methods are elusive.