|
|
|
|
|
by hedgehog
414 days ago
|
|
SPECULATION ALERT! I think there's reasonable motivation though. In the last few years there has been a steady drip of papers in the general area, at least insofar as they use vision transformers and image pyramids, and work on applying RL to object detection goes back before that. IoU and the general way SSD and YOLO descendants are set up is kind of wacky so I don't think it's much of a stretch to try to both 1) avoid attending to or materializing most of the pyramid, and 2) go directly to feature proposals without worrying about box anchors or grid cells or any of that. Now with that context if you still think it's a terrible idea, well, you're probably more current than I am. |
|
Modern SSD/YOLO-style detectors use efficient feature pyramids, you need that to know where to propose where things are in the image.
This sounds a lot like going back to the old school object detection techniques which end up being more inefficient in general, generally very compute inefficient.