| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jeeceebees 1846 days ago
	LSTM stands for Long Short Term Memory. It's a recurrent network that learns what and how long things should be kept in its internal state buffer. It doesn't have a fixed state size because it's just learning a nonlinear function that takes an input and a state to an output and a new state. Obviously it can't model all possible, infinite length recurrences, but it can definitely do a pretty good job of approximating long term recurrence relations in complex signals.

1 comments

lunixbochs 1846 days ago

I don't think that assessment is quite right. The hidden size is fixed - the second argument to Pytorch's nn.LSTM constructor is "hidden_size – The number of features in the hidden state h".

A call to `y, hidden = layer.forward(x)` (where x has a batch size of 1, and an arbitrary length) produces two hidden states of dimensions `(1, 1, hidden_size)`, where hidden_size is the exact number you passed to the LSTM constructor. Those two states represent the long term and short term memory features.

You would need to have an LSTM with hidden_size large enough to store the samples (or a compressed representation) of your entire loop. Not to mention you'd run into other issues with handling the logic around variable length loops based on a pedal toggle.

link

jeeceebees 1846 days ago

The hidden state isn't storing the samples of your loop (or a compressed version of your loop). It's encoding a representation of how the output will change based on what the current state and input are. This might be strongly dependent on what the exact samples in the loop are, but it could also be more general. I think it's missing a bit of the representational power of an LSTM to see the state representation as just a buffer of the current input.

But, yeah, at some point your signal has such a complex behavior on long time scales that there isn't a good way to predict it based on a limited state size (or at least gradient descent can't find a function to predict it for you).

link

lunixbochs 1845 days ago

If you can reproduce the original information based only on a state input, you have stored it in the state (in an encoded form or not). If your state is smaller than the original information, you have compressed it. If your reproduction is not faithful to the original, you have created lossy compression.

If the future input samples have a meaningful impact during loop playback, then it hasn't learned the correct behavior of the original loop pedal.

Note that the linked project appears to use a hidden size of 20. Twenty floats. With that much space we're very much back to "sure, you might theoretically be able to loop if the information fits in the hidden size".

Increasing the hidden size beyond 20 still won't solve learning the complex state machine behavior of an original loop pedal, which can loop variable length audio. You'd need to provide the pedal state to the network in addition to the audio, and probably train need to train it on a bunch of different loop lengths (>thousands?).

This would mostly be an academic pursuit, as it's extremely impractical compared to the other uses of the device.

link