|
|
|
|
|
by pdxww
2610 days ago
|
|
T1 indeed contains all the info needed, but T1 also has limited capacity and can't capture long patterns. T1 would need to have 100s of billions weights to capture minute long patterns. I think this idea is similar to the often used skip connections. |
|
Why would you try to manually duplicate this process by creating F1, F2, etc?
The idea of skip connections would be like feeding T1 output to T3, in addition to T2. Again, I’m not sure what useful info F sequences would supply in this scenario.