|
|
|
|
|
by Fripplebubby
716 days ago
|
|
Love this post! Gets into the details of what it _really_ means to take some function and turn it into an RNN, and comparing that to the "batteries included" RNNs included in PyTorch, as a learning experience. Question: > To model the state, we need to add three hidden layers to the network. How did you determine that it would be three hidden layers? Is it a consequence of the particular rule you were implementing, or is that generally how many layers you would use to implement a rule of this shape (using your architecture rather than Elman's - could we use fewer layers with Elman's?)? |
|
For your first question, using three hidden layers makes it a little clearer what the network does. Each layer performs one step of the calculation. The first layer collects what is known from the current token and what we knew after the calculation for the previous token. The second layer decides whether the current token looks like program code, by checking if it satisfies the decision rule. The third layer compares the decision with what we decided for previous tokens.
I think that this could be compressed into a single hidden layer, too. A ReLU should be good enough at capturing non-linearities so this should work.