|
|
|
|
|
by p13rr0m
2350 days ago
|
|
I believe the paper SlowFast Networks for Video Recognition goes somewhat in that direction. The architecture is split into a fast pathway, to capture motion information, and a slow pathway that captures spatial semantics. https://arxiv.org/pdf/1812.03982.pdf |
|