|
|
|
|
|
by D-Machine
175 days ago
|
|
No, not at all. There is a transformer obsession that is quite possibly not supported by the actual facts (CNNs can still do just as well: https://arxiv.org/abs/2310.16764), and CNNs definitely remain preferable for smaller and more specialized tasks (e.g. computer vision on medical data). If you also get into more robust and/or specialized tasks (e.g. rotation invariant computer vision models, graph neural networks, models working on point-cloud data, etc) then transformers are also not obviously the right choice at all (or even usable in the first place). So plenty of other useful architectures out there. |
|
What about DINOv2 and DINOv3, 1B and 7B, vision transformer models? This paper [1] suggests significant improvements over traditional YOLO-based object detection.
[1] https://arxiv.org/html/2509.20787v2