Hacker News new | ask | show | jobs
by matusp 251 days ago
> The way I think of these transformations, but happy to be corrected, is more a matter of adding information rather than modifying

This is very much the case considering the residual connections within the model. The final representation can be expressed as a sum of representations from N layers, where the N-th representation is a function of N-1-th.