Hacker News new | ask | show | jobs
by matt_langston 1182 days ago
I wonder why that parameter is called "h"? Hmmm ...

Why not just say the word "hysteresis" and bring some magnets to class for show-and-tell to help your students develop an intuition for the "h" parameter in RNNs.

4 comments

Like jimsimmons said below, I believe it traditionally refers to 'hidden'. which was in vogue at the time for both feedforward nets and RNNs as well as any other other neural networks in the 90's or so and on. This trend actually continued for a while and I learned it in one of Hintons' main online classes which was made somewhere between 2012-2015 or so IIRC (though I opted to switch to reading and trying to implement raw papers instead as my brain works intuitively strangely, on the whole).

You can think of it as everything the RNN knows about what you're doing and a thing that evolves from place to place as you go. Because it is iterated on itself as a map, it abides by some very interesting properties that let it represent some very difficult functions, though actually attaining a representation of those functions is rather difficult indeed in my experience from what I've seen.

There are one or two rather successful projects trying to keep RNNs both alive and competitive with transformers. I think they do very well on the whole, though the transformers seem to have slightly improved parameter efficiency, generally speaking.

I hope this helps you with your question, please do let me know if you have any other follow up questions on this topic/matter. (: (: :) :)

Hmm i read a tonne of RNN lit before 2020 and 'd never come across the term "hysteresis parameter" standing in for the hidden units. is it a recent trend? Google seem to suggest so
I didn't mention anything at all about a hysteresis parameter.
Hidden state?
hysteresis is also important to understand for working with radio networking.
And for analog electronics in general!
And for non linear forcing of plasmas… but it’s been many years since my phd
As defined by the lecturer herself, "h(t) = h(t-1)", the very definition of hysteresis.

My point is that the lecturer missed a golden opportunity to give her students a natural intuition of "h" that they can see, feel and touch and that will serve them well for their entire careers.

The only thing "hidden" about "h" is that hysteresis is hidden in plain site in her lecture - maybe the lecturer did not know herself.

Neural networks have an undeserved reputation for being mysterious, and maybe that is partly due to a lack of basic physics knowledge.

> As defined by the lecturer herself, "h(t) = h(t-1)", the very definition of hysteresis.

How is that a definition of hysteresis?

Hysteresis is when state is a function of previous state, not identical to previous state.

It's just simplified pseudoscope using the lecturer's own notation from her slides to make my point.

The following is the lecturer's full TeX form if that helps:

h(t) = \tanh \left(h(t-1) W_{\text{hh}}^T+x(t) W_{\text{hx}}^T\right)

However, I don't want our readers to get distracted by line noise; h(t) = h(t - 1) makes my point.

Back in the day, having taken some kind of statistical signal processing course would have been common before getting into neural networks. That would likely have covered a lot of intuitions.