|
|
|
|
|
by threatripper
2073 days ago
|
|
How would one accelerate object tracking on a video stream where each frame depends on the result of the previous one? Batching and multi-threading doesn't work here. Are there some CNN-libraries that have way less overhead for small batch sizes? Tensorflow (GPU accelerated) seems to go down from 10000 fps on large batches to 200 fps for single frames for a small CNN. |
|
1. How many times is the data being copied, or moved between devices?
2. Are you recomputing data from previous frames that you could just be saving? For example, some tracking algorithms apply the same CNN tower to the last 3-5 images, and you could just save the results from the last frame instead of recomputing. (Of course, you also want to follow hint #1 and keep these results on the GPU).
3. Change the algorithm or network you're using.
Really you should read the original article carefully. The article is showing you the steps for profiling what part of the runtime is slow. Typically, once you profile a little you'll be surprised to find that time is being wasted somewhere unexpected.