|
|
|
|
|
by jackblemming
1162 days ago
|
|
Vision models are still CNNs trained with backprop. Now they’ve begun to incorporate transformers into vision task, but the results aren’t x10 better. Please explain to me how vision models today are massively different than Alexnet and not just a bunch of slight optimizations and tricks to eek out some marginal accuracy improvements. |
|