Hacker News new | ask | show | jobs
by 0x20cowboy 2 days ago
> There's no 'rigorous comparison' that puts CNNs over Vits

That’s not accurate. My team wrote a paper for school in which a resnet model out performed a ViT model of the same size on almost all metrics. These were smaller models, but depending on the use case that might be what you want.

1 comments

Don't know if it's you (did you publish?). I read about something similar but it had its issies:

- Tuning hyperparameters to gain improvement on a dataset when you're constantly looking at the answers is pretty meaningless. It's basically testing on the training data.

- Eval on ImageNet1k alone (very small, useless for the real world) made me wonder if it wasn't just overfit to the training set. Would it perform better training on the datasets used for the foundation models ? I doubt it.

Well I'm not saying CNNs are bad or useless at any rate.

Exactly. Most of the comparison papers are useless. This is hard stuff, only few people have the chops it takes to even attempt this. You can of course train some models and then post the numbers, that's not the hard part.