Hacker News new | ask | show | jobs
by p13rr0m 2350 days ago
I believe the paper SlowFast Networks for Video Recognition goes somewhat in that direction. The architecture is split into a fast pathway, to capture motion information, and a slow pathway that captures spatial semantics. https://arxiv.org/pdf/1812.03982.pdf