| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hadsed 2490 days ago
	If you think a longer context length might be helpful consider stacking convolutions to give higher units a bigger receptive field, or try the convolutional LSTM. If that helps and you have a further argument for why an even larger context window would be helpful then perhaps try attention and in that case a Transformer would be reasonable. But your stacked conv net would be the fastest and most obvious thing that should work (with the caveat that I know nothing else about your data and it's characteristics, which is a really big caveat). Consider looking at your errors and judging whether they stem from things your current model doesn't do well but that Transformers do, i.e., correlating two steps in a sequence across a large number of time steps. Attention is basically a memory module, so if you don't need that it's just a waste of compute resources.

1 comments

ccccppppp 2490 days ago

Thanks for the insight, also for mentioning convolutional LSTM, I wasn't aware such a thing existed.

> Attention is basically a memory module, so if you don't need that it's just a waste of compute resources.

But aren't CNNs also like a memory module (ie: they memorize how leopard skin looks like)? I guess attention is a more sophisticated kind of memory, "more dynamic" so to speak.

Anyway, I'm glad to hear that a transformer architecture isn't totally stupid for my task, I will look up the literature, there seems to be a bit on this matter.

link

hadsed 2477 days ago

Yeah, in some sense any layer is a "memory module". Perhaps more specifically, attention solves the problem of directly correlating two items in a sequence that are very, very far away from each other. I'd generally caution against using attention prematurely as it's extremely slow, meaning you'll waste a lot of your time and resources without knowing if it'll help. Stacking conv layers or using recurrence is an easy middle step that, if it helps, can guide you on whether attention could provide even more gains.

link