| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mrcoder111 2736 days ago
	How do you handle variable length input without something like an RNN? Even transformers use RNN structures right. I suppose convolutions could technically handle variable length inputs (just slide the window of weights over different length inputs) but I don't think tensorflow or pytorch supports this

2 comments

phowon 2736 days ago

>Even transformers use RNN structures right.

Nope.

>How do you handle variable length input without something like an RNN?

Any form of pooling, really. Max, Avg, Sum. The tricky part is how to do the pooling while still taking advantage of the sequential structure of the input information. The Transformer -based models have shown that you can get away with providing very little order information and still go very far.

link

mrcoder111 2736 days ago

Samcodes said it above. How do transformers build a shared representation of two input sentences with different lengths? If you convolve them with the same filter, you get two different sized convolution outputs - the embedding dimensions don't align.

link

phowon 2736 days ago

Like I said - pooling.

You can take the mean over 3 elements or 10 elements just the same. Pooling is lossy, but it seems that if you have the right architecture the model can still learn what it needs to.

It's worth noting that the attention mechanism (at least in RNNs) has always been invariant to inputs lengths. It's a weighted sum with weights computed per element, so there's no length constraint at all.

link

mrcoder111 2735 days ago

Can you share some paper names or links to architectures that demonstrate the length invariant convolution and attention?

link

phowon 2735 days ago

I'm not sure if you're understanding me correctly.

Attention is generally length invariant. You take some transformation on the hidden representations (/+ inputs) at that each time step, and then you normalize over all the transformed values to get weights that sum to one. No part of this is constrained by length.

For CNNs, any network that has pooling has the potential to be length/dimension invariant. Whether it actually is is a combination of the architectural design and an implementation detail (e.g. some implementations when trying to pool will specifically define a pooling operation over, say, a 9x9 window. You could define the same pooling operation over a variable-dimension window).

The length/dimension invariance aren't a special or novel property. In the case of attention it's built in. In the case of CNNs, the convolutions are not length invariant, but depending on the architecture, the pooling operations are (or can be modified to be).

link

mrcoder111 2734 days ago

In order to get a variable length context, you need to add some machinery to some forms of attention. For example, in jointly learning to align and translate, the attention is certainly not invariant to number of context vectors. You train the attention to take in a fixed number of context vectors and produce a distribution over the fixed number of context vectors. You cannot train on images with 5 annotations/context vectors and expect anything to transfer to a setting with 10 annotations. That's why I would be interested in a specific paper to solidify what you're saying.

link

samcodes 2736 days ago

The hard part is that after the convolutions you want a fully connected layer or two, and to get those dimensions right you need to know the input dimensions. But, pytorch is building the graph at runtime, so maybe you could do this...

link