Y
Hacker News
new
|
ask
|
show
|
jobs
by
Fripplebubby
715 days ago
Ah, that makes sense. So, we consider two hidden layers more as "memory" or "buffers", and actually the rule is implemented in just one layer, at least for a single token.