| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by matt_langston 1182 days ago
	I wonder why that parameter is called "h"? Hmmm ... Why not just say the word "hysteresis" and bring some magnets to class for show-and-tell to help your students develop an intuition for the "h" parameter in RNNs.

4 comments

tysam_and 1181 days ago

Like jimsimmons said below, I believe it traditionally refers to 'hidden'. which was in vogue at the time for both feedforward nets and RNNs as well as any other other neural networks in the 90's or so and on. This trend actually continued for a while and I learned it in one of Hintons' main online classes which was made somewhere between 2012-2015 or so IIRC (though I opted to switch to reading and trying to implement raw papers instead as my brain works intuitively strangely, on the whole).

You can think of it as everything the RNN knows about what you're doing and a thing that evolves from place to place as you go. Because it is iterated on itself as a map, it abides by some very interesting properties that let it represent some very difficult functions, though actually attaining a representation of those functions is rather difficult indeed in my experience from what I've seen.

There are one or two rather successful projects trying to keep RNNs both alive and competitive with transformers. I think they do very well on the whole, though the transformers seem to have slightly improved parameter efficiency, generally speaking.

I hope this helps you with your question, please do let me know if you have any other follow up questions on this topic/matter. (: (: :) :)

link

kalimanzaro 1181 days ago

Hmm i read a tonne of RNN lit before 2020 and 'd never come across the term "hysteresis parameter" standing in for the hidden units. is it a recent trend? Google seem to suggest so

link

tysam_and 1181 days ago

I didn't mention anything at all about a hysteresis parameter.

link

jimsimmons 1182 days ago

Hidden state?

link

totetsu 1182 days ago

hysteresis is also important to understand for working with radio networking.

link

zxexz 1182 days ago

And for analog electronics in general!

link

ajdegol 1182 days ago

And for non linear forcing of plasmas… but it’s been many years since my phd

link

matt_langston 1181 days ago

As defined by the lecturer herself, "h(t) = h(t-1)", the very definition of hysteresis.

My point is that the lecturer missed a golden opportunity to give her students a natural intuition of "h" that they can see, feel and touch and that will serve them well for their entire careers.

The only thing "hidden" about "h" is that hysteresis is hidden in plain site in her lecture - maybe the lecturer did not know herself.

Neural networks have an undeserved reputation for being mysterious, and maybe that is partly due to a lack of basic physics knowledge.

link

lupire 1181 days ago

> As defined by the lecturer herself, "h(t) = h(t-1)", the very definition of hysteresis.

How is that a definition of hysteresis?

Hysteresis is when state is a function of previous state, not identical to previous state.

link

matt_langston 1181 days ago

It's just simplified pseudoscope using the lecturer's own notation from her slides to make my point.

The following is the lecturer's full TeX form if that helps:

h(t) = \tanh \left(h(t-1) W_{\text{hh}}^T+x(t) W_{\text{hx}}^T\right)

However, I don't want our readers to get distracted by line noise; h(t) = h(t - 1) makes my point.

link

woodson 1181 days ago

Back in the day, having taken some kind of statistical signal processing course would have been common before getting into neural networks. That would likely have covered a lot of intuitions.

link