| I'm trying to learn the nuts-and-bolts of Machine Learning, but the more I dig in, the stupider the assumptions seem to be. The thought keeps popping into my head over and over again: Just because it works doesn't mean it works well, or that it works in a smart, optimal, or even ontologically truthful/useful/realistic way. There are no shortage of videos, papers, tutorials, blogs that explain the math & models in detail. But there are exceptionally few sources that explain the underlying assumptions...and why these are useful (or not useful) assumptions. Why does Machine Learning use these assumptions? -- 1) Sigmoid Functions & Binary Classification - I understand the math and the probabilities. But rather: WHY would you want to classify using a binary system of classification? WHY would you want to reduce everything to yes/no? Or more accurately, a probability of yes/no? Or even chained probabilities of yes/no? Is it just due to being stuck in the paradigm of programming on machines built on yes/no logic gates? Trying to perform these very complex tasks (identification, generation, whatever) on CPUs and software that are, in and of themselves, built on binary distinction? If all you have is a binary logic gate (hammer), then everything looks like a cumulative distribution function (nail)? Isn't this a totally moronic approach? Or is it just the best we got? I feel like it's stuck back in the signal processing days of trying to "fit" and force a signal to achieve a certain pattern without realizing the what or why. Turning knobs on an oscilloscope. -- 2) Layers - Why are artificial neural networks setup as "layers"? Isn't this more like an assembly line? Doesn't that seem dumb? Why would someone believe, in their heart of hearts, that intelligence or pattern recognition, or any kind of thinking, happens procedurally? Doesn't this (again) seem like a very moronic approach? One that is based on the procedural nature of the machine itself? And the programmer themself? And not the nature of thinking, intelligence, or even complex analysis / complex systems? Complex systems with lots of variables and lots of dimensions don't actually interact like this. They don't have "layers", this is a totally made-up assumption that has major implications on the entire field. Was this just chosen out of necessity, because software and programs need a beginning and an end? And input and an output? Or is there some really convincing argument, that speaks to the philosophy and ontology of these decisions? |
2. To have nonlinearities between the layers, and to have layers with varying complexity and structure. In practice it works much better than all the alternatives that we've tried.
These things are explained very well even at a beginner level, and you aren't really questioning them deeply or proposing any alternatives, instead you seem to be getting into philosophy.