Hacker News new | ask | show | jobs
by Fripplebubby 715 days ago
Ah, that makes sense. So, we consider two hidden layers more as "memory" or "buffers", and actually the rule is implemented in just one layer, at least for a single token.