Hacker News new | ask | show | jobs
by jackblemming 1162 days ago
Vision models are still CNNs trained with backprop. Now they’ve begun to incorporate transformers into vision task, but the results aren’t x10 better. Please explain to me how vision models today are massively different than Alexnet and not just a bunch of slight optimizations and tricks to eek out some marginal accuracy improvements.
1 comments

If you're talking about vision, fine. But you were replying to a comment about ChatGPT and then brought up AGI. No offence, but although some vision tasks are AI-hard, in general vision has minimal to do with AGI. That's why I quit the field. Transformers in vision may not be very interesting, but they certainly are a breakthrough in language.