Hacker News new | ask | show | jobs
by g_airborne 1970 days ago
I couldn't agree more, especially with the latter part. I've worked on action recognition with I3D for over a year now, and found that seemingly equivalent implementations in Keras, TensorFlow 2 or PyTorch will produce wildly different results. Worse yet, I found a bunch of papers that will claim SOTA results compared against one of those non-original implementations with just a few percentage-point differences. It makes no sense! It took me hundreds of hours to hunt down the differences between how these frameworks implement their layers before I could come even close to the expected accuracy...
1 comments

shameless ad: try mmaction2, where every result is reproducible https://github.com/open-mmlab/mmaction2 . Modelzoo: https://mmaction2.readthedocs.io/en/latest/modelzoo.html
This is very cool, I’ll be studying your implementation of I3D. Did you ever attempt to train I3D end-to-end as done in the Quo Vadis paper? And it so, did you get comparable Top1/Top5 accuracy?
Sure, checkpoints, configs and detailed training logs all are available at modelzoo https://mmaction2.readthedocs.io/en/latest/recognition_model...

The single RGB stream top1 goes up to 73.48% with resnet50, and up to 74.71% equipped with non-local. Both are much higher than the original paper with two-streams.