|
|
|
|
|
by in-silico
165 days ago
|
|
Why can't you just leave H_res as the identity matrix (or just not use it at all)? In that case, the model is basically a ResNet again and you don't need to worry about exploding/vanishing gradients from H_res. I would think that H_post and H_pre could cover the lost expressiveness. |
|