|
|
|
|
|
by abainbridge
3113 days ago
|
|
It's not as simple as that. The problem is that YOLO can process ~100 FPS, where a frame is something like 250x250 pixels. If you want to process high resolution, you have to sweep a 250x250 "window" across the high res image and run Yolo on each. Then you have to figure out which objects were multiply counted due to the windowing. Even if the windows don't overlap (which they should) Yolo might detect the left half of a person in one window and the right half in the next. Once you've done all that, a top end GPU can handle about 4 FPS (assuming 1080P input). Then the problem is that YOLO will occasionally miss blindingly obvious objects. That combined with the fact that you've only got 4 FPS means that detecting the direction of a person is hard - they tend to move across the camera's field of view before you've got enough data to be confident what just happened. A person walking from left to right looks the same as someone walking off the left of the frame and then someone different walking onto the right of the next frame. At some point its easier, cheaper and more accurate to install an IR laser beam and count the breaks. You'll save about half a kilowatt too. Another interesting point is that YOLO is pre-trained on hundreds of object classes. This feels like a waste. I wanted to retrain it with all but the people class removed from the training set. My learned colleague suggested that was a stupid idea because YOLO learns general info about how to separate objects from backgrounds from all the object classes. Not showing it surfboards makes it worse at detecting people. Crazy. |
|
OpenCV's facialRecognizer class is one of the fastest I found. And it's what I used in my program.
Primarily, it does a LBP cascade finding "any face", including ones that look like walls. Thankfully it has False positives, but almost never false negatives. Then, I use each region of interest's area and do a haar cascade for eyes. If theres at least 1 eye in the region of interst, I pass it to the classifier.
From there, the classifier then runs the image zone into the classifier. If its not there, it adds it. if it is, then it adds this as another sample to further prove the face.
I can get 15 FPS@1280x720 on a Thinkpad T61