Hacker News new | ask | show | jobs
by swairshah 2699 days ago
I don't understand what you mean by 'non-locality of convolutions'. Isn't convolution inherently a local operation? This probably being one of the main reasons that CNNs are biased towards texture [0] and not shapes?

[0] https://openreview.net/forum?id=Bygh9j09KX

1 comments

Convolutions in a hierarchy of layers, especially with dilated convolutions, provide long-range connections between inputs (handwavily logarithmic). In an RNN, they are separated by however many steps in a linear way, so gradients more easily vanish. Some paper which I do not recall examined them side by side and found that RNNs quickly forget inputs, even with LSTMs, and this means their theoretically unlimited long-range connections between inputs via their hidden state don't wind up being that useful.
In an RNN, you could connect each hidden state at time step t, h(t) to h(t-N), instead of, or in addition to, h(t-1), making it analogous to dilated convolutions, but with hidden-to-hidden connections at the same layer.

So I don't think RNNs are fundamentally more myopic than CNNs (just that there may be practical advantages to using the latter)

Hierarchical RNNs, Clockwork RNNs and Hierarchical Multiscale RNNs and probably others are doing things of this nature.

You could, but it's not equivalent, and no one seems to have been able to use clockwork RNNs or related archs to achieve similar performance, so the differences would seem to make a difference.
Right. I'm just saying that this myopia is not a fundamental property of the recurrence any more than of convolution.

Clockwork RNNs subsample, BTW, so they are more analogous stride=2 in CNNs than to dilation.

That’s an awful lot of woo to describe something “theoretical” in the sense of being imaginary but not theoretical in the sense of ever proven in a rigorous way mathematically.

Just some papers, you know?

We’re so screwed.