| > It turned out that architectural improvements really did matter Transformer (2017).
NET based Diffusion (2015). Score based diffusion (2019). DDPM (2020). Uses unet (2015).
Clip (~2021) uses Resnet (2015) + ViT (2020). stable diffusion also uses Unet. Yes, deep learning is easy. Throw compute, you get the answer. Those with insights on the old problem are now churning papers because they throw compute and deep learning to the stuffs they understood well. You can look into any paper and see inspiration from old methods. Applying deep learning is hard, deep learning itself is quite easy. If it were really hard, it wouldn't have been popular at all! Nowadays most of the people don't even bother with architectural gains. Sure if better architecture eventually comes up, people will throw new ones into the compute. I don't believe in bayesian stuff either. However it is worth learning otherwise you might miss insights on a lot of papers! Whatever you do, just having the idea to understand problem matters a lot. Then later comes the part of deep learning and throwing compute. |
If we had unlimited compute and time for training, I don’t think we would’ve really moved on from dense feed forward nets.