| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by trott 2699 days ago

In an RNN, you could connect each hidden state at time step t, h(t) to h(t-N), instead of, or in addition to, h(t-1), making it analogous to dilated convolutions, but with hidden-to-hidden connections at the same layer.

So I don't think RNNs are fundamentally more myopic than CNNs (just that there may be practical advantages to using the latter)

Hierarchical RNNs, Clockwork RNNs and Hierarchical Multiscale RNNs and probably others are doing things of this nature.

1 comments

gwern 2699 days ago

You could, but it's not equivalent, and no one seems to have been able to use clockwork RNNs or related archs to achieve similar performance, so the differences would seem to make a difference.

link

trott 2699 days ago

Right. I'm just saying that this myopia is not a fundamental property of the recurrence any more than of convolution.

Clockwork RNNs subsample, BTW, so they are more analogous stride=2 in CNNs than to dilation.

link