| (1) From the viewpoint of ontology, binary classes are the most elemental and keep you away from the open pits of modelling that people are always walking into. For instance, where should a modern book on digital photography be filed in the library? Should it go in the 000's with computing? In the 700's under art? Or in the 600's with technology (an application of optics, electronics, etc.) All these answers are right but they are also wrong. (Like why isn't computing filed with electronics in the 600's or math in the 500's?) If you're physically filing the book in a place in the library you have to assign it one category out of all of those because it can only be in one place. If you're trying to do anything else and get correct answers it is simultaneously true that a book is about how to use computer software (say Lightroom) and about how to make art, about the optical performance of lenses, but not about asian languages, nuclear energy, or how to play casino games. There are certain cases where classes are mutually exclusive and in those cases it is usually right to model those as a constraint rather than start with multi-class classification which usually winds up like https://en.wikipedia.org/wiki/Celestial_Emporium_of_Benevole... unless there is something structurally special about the problem. If you approach the classification of books as asking the question "Is this book about this topic?" the problem becomes tractable... Because the reason why a particular book that could be filed in multiple places is filed in one particular place is "because some librarian decided to file it there". You could never train an algorithm to reproduce the same arbitrary decisions that different librarians make arbitrarily, you'd always have a high error rate. If the question is "Is this book about how to use computer software" then you can get close to 100% in accuracy. To attempt the first is to decide to fail at the very beginning. Also often the math works for binary classification and doesn't work for other kinds. See https://plato.stanford.edu/entries/arrows-theorem/ for one kind of problem with is trivial for two choices and intractable for more than two. (Funny there are two kinds of people... the ones who know what the knobs of the oscilloscope do and the ones that don't!) (2) The visual cortex of your brain has layers much like the layers of a convolutional network. An anti-aircraft missile system has layers of processing from raw signals, from which are discovered momentary blips, which are assembled into tracks, etc. Matter is made of quarks and electrons, the quarks form protons and nuclei, which form nuclei, which are the core of atoms, which form molecules, etc. Insofar as we are not dying at 30, freezing in the dark, frightened of the howling of wolves, and believing everything happens because some god wants it to happen, it's because we see a hierarchical structure in the universe. If you had a million neurons all wired to each other it would be an intractable problem to solve for the coefficients because there are so many of them not to mention so many symmetries that would let you trade these ones over here for those ones over there which would make it hard to get started. The wiring diagram for your brain is not like the wiring diagram for a TV set, but your genetic code does wire certain populations of neurons in certain areas to other populations in other areas and then the neurons fine-tune their coefficients based on your experience. And don't dismiss "programs need a beginning and end" and "input and output" as incidental, they're absolutely essential to writing a program. |
(2) Regarding: "beginning and end", "input and output"
It's my understanding that neural networks suck at learning from periodic functions, one of the most basic functions of importance to human society and natural science.
I'd argue that this isn't JUST because of the math, it's also the assumptions being made.
Regarding: "visual cortex of your brain has layers".
You're talking about an instrument of data collection & data filtering. Not an instrument of inference.
Keep digging deeper...And deeper...you will never find a neuron that can recognize your Grandma. Or your cat. Or that guy you hate at the grocery store.
https://en.wikipedia.org/wiki/Grandmother_cell
This is part of the problem with reductionist thinking and the reductionist approaches I see in ML.
I'll try to expand and explain more tomorrow on where I'm coming from (nonlinear dynamics, complex analysis and chaos theory)