| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by masswerk 1203 days ago
	This dismissal of Minsky misses that Minsky had actually extensive experience with neural nets (starting in the 1950s, with neural nets in hardware) and was around 1960 probably the most experienced person in the field. Also, in Jan 1961, he published “Steps Toward Artificial Intelligence” [0], where we not only find a description of gradient descend (then "hill climbing", compare sect. B in “Steps”, as this was still measured towards a success parameter and not against an error function), but also a summary of experiences with this. (Also, the eventual reversal of success into a quantifiable error function may provide some answer to the question of success in statistical models.) [0] Minsky, Marvin, “Steps Toward Artificial Intelligence”, Proceedings of the IRE, Volume: 49, Issue: 1, Jan. 1961: https://courses.csail.mit.edu/6.803/pdf/steps.pdf

1 comments

riku_iki 1203 days ago

Gradient descent was invented before Minsky. Imo, Minsky produced some vague writings, with no significant practical impact, but this is enough for some people to claim his founder's role in the field.

link

masswerk 1203 days ago

Minsky was actually a pioneer in the field, when it came to working with real networks. Compare

[0] “A Neural-Analogue Calculator Based upon a Probability Model of Reinforcement”, Harvard University Psychological Laboratories, Cambridge, MA, January 8, 1952

[1] “Neural Nets and the Brain Model Problem”, Princeton Ph.D dissertation, 1954

In comparison, Frank Rosenblatt's Perceptron at Cornell was only built in 1958. Notably, Minsky's SNARC (1951) was the first learning neural network.

link

riku_iki 1203 days ago

> when it came to working with real networks. Compare

my understanding is that that no one knows what that SNARK thing was, he built something on the grant, abandoned it shortly after that, and only many years later he and fanboys started using it as foundation of bold claims about his role in the field.

link

masswerk 1203 days ago

Well, his papers are out there to read.

link

riku_iki 1203 days ago

Yes, and I read them: https://dspace.mit.edu/bitstream/handle/1721.1/6103/AIM-048....

vague esssay without specifics

link

masswerk 1203 days ago

So you may like better,

> “Multiple simultaneous optimizers” search for a (local) maximum value of some function E(λ1, …, λn) of several parameters. Each unit Ui independently “jitters” its parameter λ1, perhaps randomly, by adding a variation δi(t) to a current mean value μi. The changes in the quantities λi and E are correlated, and the result is used to slowly change μi. The filters are to remove DC components. This technique, a form of coherent detection, usually has an advantage over methods dealing separately and sequentially with each parameter.

(In “Steps”)

:-)

link