|
|
|
|
|
by Bayes7
937 days ago
|
|
"[...] modern neural network (NN) architectures have complex designs with many components [...]" I find the Transformer architecture actually very simple compared to previous models like LSTMs or other recurrent models. You could argue that their vision counterparts like ViT are conceptually maybe even simpler than ConvNets? Also, can someone explain why they are so keen to remove the skip connections? At least when it comes to coding, nothing is simpler than adding a skip connection and computationally the effect should be marginal? |
|
This is especially true for example for inference for vision transformers, where it decrease the batch size you can use before hitting the L2 capacity wall.