|
|
|
|
|
by immibis
623 days ago
|
|
This architecture, on the surface, seems to preclude the basic function of recognizing sequences of tokens. At the very least, it seems like it should suffer from something like the pumping lemma: if [the ][cat ][is ][black ] results in the output getting close to a certain vector, [the ][cat ][is ][black ][the ][cat ][is ][black ][the ][cat ][is ][black ] should get even closer to that vector and nowhere close to a "why did you just repeat the same sentence three times" vector? Without non-linear mixing between input token and hidden state, there will be a lot of linear similarities between similar token sequences... |
|