Hacker News new | ask | show | jobs
by lunixbochs 1846 days ago
I don't think that assessment is quite right. The hidden size is fixed - the second argument to Pytorch's nn.LSTM constructor is "hidden_size – The number of features in the hidden state h".

A call to `y, hidden = layer.forward(x)` (where x has a batch size of 1, and an arbitrary length) produces two hidden states of dimensions `(1, 1, hidden_size)`, where hidden_size is the exact number you passed to the LSTM constructor. Those two states represent the long term and short term memory features.

You would need to have an LSTM with hidden_size large enough to store the samples (or a compressed representation) of your entire loop. Not to mention you'd run into other issues with handling the logic around variable length loops based on a pedal toggle.

1 comments

The hidden state isn't storing the samples of your loop (or a compressed version of your loop). It's encoding a representation of how the output will change based on what the current state and input are. This might be strongly dependent on what the exact samples in the loop are, but it could also be more general. I think it's missing a bit of the representational power of an LSTM to see the state representation as just a buffer of the current input.

But, yeah, at some point your signal has such a complex behavior on long time scales that there isn't a good way to predict it based on a limited state size (or at least gradient descent can't find a function to predict it for you).

If you can reproduce the original information based only on a state input, you have stored it in the state (in an encoded form or not). If your state is smaller than the original information, you have compressed it. If your reproduction is not faithful to the original, you have created lossy compression.

If the future input samples have a meaningful impact during loop playback, then it hasn't learned the correct behavior of the original loop pedal.

Note that the linked project appears to use a hidden size of 20. Twenty floats. With that much space we're very much back to "sure, you might theoretically be able to loop if the information fits in the hidden size".

Increasing the hidden size beyond 20 still won't solve learning the complex state machine behavior of an original loop pedal, which can loop variable length audio. You'd need to provide the pedal state to the network in addition to the audio, and probably train need to train it on a bunch of different loop lengths (>thousands?).

This would mostly be an academic pursuit, as it's extremely impractical compared to the other uses of the device.