| HN Mirror

https://en.wikipedia.org/wiki/Grandmother_cell

Thank you for your thoughtful and detailed response. I'm digesting your points and have some research papers and thoughts I'm going to share tomorrow, but here's an immediate response on #2.

(2) Regarding: "beginning and end", "input and output"

It's my understanding that neural networks suck at learning from periodic functions, one of the most basic functions of importance to human society and natural science.

I'd argue that this isn't JUST because of the math, it's also the assumptions being made.

Regarding: "visual cortex of your brain has layers".

You're talking about an instrument of data collection & data filtering. Not an instrument of inference.

Keep digging deeper...And deeper...you will never find a neuron that can recognize your Grandma. Or your cat. Or that guy you hate at the grocery store.

This is part of the problem with reductionist thinking and the reductionist approaches I see in ML.

I'll try to expand and explain more tomorrow on where I'm coming from (nonlinear dynamics, complex analysis and chaos theory)

I'll warn you that I am a arch-reductionist and have a PhD in chaos theory.

Then you are exactly who I want to talk to and learn from!!

The kind of wicked problems they talk about involve not everybody being on board with solving the problem (e.g. the drug addict who doesn't want to stop using, the billionaire who would be 100,000 times poorer if wealth was evenly distributed) or not seeing the problem the same way (the white person who would be at most 10% poorer if wealth was evenly distributed but sure gets scared when somebody kicks down the door at the gas station and steals all the green Newports.)

I posted it before I had realized you had a background in chaos theory. It's being discussed on HN right now, so it seemed timely and relevant.

That said, the paper still touches on the same problems with reductionism and simplification of complex systems.

To quote from the paper:

We draw on the ‘reductive tendency’, a process through which individuals simplify complex systems into cognitively manageable representations. While simplified representations offer benefits, such as quicker decision-making, such representations are often inaccurate as they overlook the complexities of the problem at hand.

Compounding these factors is the nonlinear nature of wicked problems, where “cause and effect relationships are either unknown or highly uncertain.

Second, wicked problems present potential entrepreneurs with “radical” uncertainty. Because of their specifically nonlinear and interrelated complexity, wicked problems “have no closed form definition”

Multiple reasons have been offered as to why reduction is so common. For example, the ability to reason about complexity requires a range of components to be prioritized to understand how they relate within a system. As this is difficult, individuals adopt understandings that are simpler in nature, thereby reducing the perceived complexity of a problem (Feltovich, Spiro, and Coulson, 1993). Others suggest that the tendency is a habitual carry-over from the rudimentary and routinized way that beginners are introduced to a concept (Gibson and Spelke, 1983). For many individuals, simpler conceptual forms are often employed to introduce a topic (Feltovich et al., 1989). This may, however, set up path-dependent learning that relies on reduction as a crutch (Feltovich et al., 1986). Another argument arises from motivational psychology and the finding that people prefer a middle level of complexity in their lives; concepts that are too simple are deemed boring, while concepts that are too complex are off-putting and do not attract engagement (Berlyne, 1971).

Research has identified 11 dimensions or manifestations of the reductive tendency (Feltovich et al., 2004; Hmelo-Silver and Pfeffer, 2004). We organize these into three categories.

The first pertains to simplifying processes and entails four dimensions: continuous processes are simplified into ones with discrete steps; interactive processes that depend on each other are simplified to be independent and separated; concurrent processes are simplified to be sequential; and nonlinear explanatory relationships are simplified into linear ones.

The second category pertains to perspective restrictions. This category describes situations in which individuals minimize the importance of, or ignore altogether, facets or manifestations of phenomena. This category includes three dimensions whereby individuals simplify: concepts necessitating multiple representations to single ones; phenomena with numerous and ambiguous causal mechanisms to ones with simple and clear causal agents, and; concepts with covert or abstract elements to surface-level, apparent ones.

The third category contains four dimensions that pertain to forming standardized representations of phenomena. It captures situations in which individuals simplify: concepts necessitating dynamic understanding of inputs into static ones; heterogeneous schemes or facets of a phenomena into uniform or highly similar; context-sensitive phenomena into universal ones; and regularity to replace situations that are characterized by asymmetric, inconsistent, or complex patterns

^ This is closely matches the major points I'm whining about.

https://www.businessballs.com/strategy-innovation/ashbys-law...

My take is that Ashby's Law rules the roost

Namely you have to simplify any problem in order to talk about it, solve it, teach it (making some of those reductions) but there is a certain amount of complexity that is fundamental to the problem.

For instance you can sometimes get away with treating a concurrent process as sequential, sometimes you can't.

The reductionist prays for the wisdom to know which simplifications they can get away with and which ones they can't. If your model captures the essential features you are OK, otherwise you are lost in the woods.

>”Namely you have to simplify any problem in order to talk about it, solve it, teach it (making some of those reductions) but there is a certain amount of complexity that is fundamental to the problem. For instance you can sometimes get away with treating a concurrent process as sequential, sometimes you can't. The reductionist prays for the wisdom to know which simplifications they can get away with and which ones they can't. If your model captures the essential features you are OK, otherwise you are lost in the woods.”

Summed up beautifully.

My journey with Machine Learning so far:

:D Oh, nonlinear equations! This is something I know a lot about.

:) I think I see...so they use nonlinear equations in the activation function. This helps to create divergence, or sensitive dependence on initial conditions.

:| Wait it's a sigmoid function?? Wtf that's boring.

:( They're just trying to min/max a data set, and figure out probability as it relates to that min/max. But that sucks, because most of the interesting phenomenon in nature exists in BETWEEN zero and one! All the fun, cool stuff happens in the middle! You can't reduce it down to a probability, there's no way that's going to do a good job describing anything!

https://iq.opengenus.org/relu-activation/

If you don't like sigmoids you might like ReLU

For many learning tasks ReLU performs better than sigmoid.

My favorite use of sigmoid functions is

https://scikit-learn.org/stable/modules/calibration.html

where they are very good at turning arbitrary scores (say from a full-text search engine) into probabilities. For IBM Watson they tried a lot of things and found logistic functions dominated. Turning scores into probabilities was how Watson could decide if and when hitting the button would help win the game.

The big trouble with probabilities is that, potentially, every event is contingent on every other event and the joint probability distribution of all possible inputs and outputs is a huge dimensional space. In principle you could learn any function by sampling the joint probability distribution exhaustively but practically you can't get that much data. The miracle of machine learning is that the methods we use can guess at the joint probability distribution of inputs and outputs with only a limited sample.

If there was one great unsolved problem of the "old AI" namely expert systems it was doing logical reasoning over probability functions. It's not good enough to estimate that A has a 80% probability of being true, in general you need to estimate what the probability of A is if B is true and C is false. If the problem cooperates you can use half-baked methods to reason about uncertainty (like the MYCIN medical diagnosis program) but general and correct methods are elusive.

BOOSTERHIDROGEN 1351 days ago

Thanks for the question and discussions. Any books about this knowledge? Very insightful.