Hacker News new | ask | show | jobs
by enoch2090 209 days ago
Looking forward to your progress! Just checked the paper and it says the underlying backbone is still DETR. My guess would be that SAM3 uses more video frames during the training process and caused the dilution of sparse engineering-paper-like data.