Hacker News new | ask | show | jobs
by rightsForRobots 3505 days ago
Yes even within a particular video there are lots of frames where the act is implied not directly shown, like a close-up of others faces. Karpathy et al. showed they could still learn from the sports video database even with random crowd shots or announcer shots not being removed.

I think the quality for the data influences the result and hand crafting the dataset is what lead to 95% accuracy on new instances.