| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by PeachPlum 3156 days ago

I understand and agree with your nuance.

If I had a store, though, I would count someone walking past one way and then they other as two potential visits.

I would imagine that "people in room at time t" and "people in room t+1" is quite a good proxy for "number of people present all day", it is certainly an upper bound.

2 comments

abainbridge 3156 days ago

It's not as simple as that. The problem is that YOLO can process ~100 FPS, where a frame is something like 250x250 pixels. If you want to process high resolution, you have to sweep a 250x250 "window" across the high res image and run Yolo on each. Then you have to figure out which objects were multiply counted due to the windowing. Even if the windows don't overlap (which they should) Yolo might detect the left half of a person in one window and the right half in the next.

Once you've done all that, a top end GPU can handle about 4 FPS (assuming 1080P input).

Then the problem is that YOLO will occasionally miss blindingly obvious objects. That combined with the fact that you've only got 4 FPS means that detecting the direction of a person is hard - they tend to move across the camera's field of view before you've got enough data to be confident what just happened. A person walking from left to right looks the same as someone walking off the left of the frame and then someone different walking onto the right of the next frame.

At some point its easier, cheaper and more accurate to install an IR laser beam and count the breaks. You'll save about half a kilowatt too.

Another interesting point is that YOLO is pre-trained on hundreds of object classes. This feels like a waste. I wanted to retrain it with all but the people class removed from the training set. My learned colleague suggested that was a stupid idea because YOLO learns general info about how to separate objects from backgrounds from all the object classes. Not showing it surfboards makes it worse at detecting people. Crazy.

link

crankylinuxuser 3155 days ago

YOLO's a bad way to do it. I tried a bunch of facial recog libraries, had overall bad success for larger frame sizes and framerates. I also don't have a CUDA/OpenCL card for my laptops. So it's CPU for me.... Alas.

OpenCV's facialRecognizer class is one of the fastest I found. And it's what I used in my program.

Primarily, it does a LBP cascade finding "any face", including ones that look like walls. Thankfully it has False positives, but almost never false negatives. Then, I use each region of interest's area and do a haar cascade for eyes. If theres at least 1 eye in the region of interst, I pass it to the classifier.

From there, the classifier then runs the image zone into the classifier. If its not there, it adds it. if it is, then it adds this as another sample to further prove the face.

I can get 15 FPS@1280x720 on a Thinkpad T61

link

rahimnathwani 3155 days ago

If the same face is in two consecutive frames, do you pass it into the classifier twice?

link

crankylinuxuser 3154 days ago

Sure do. Doing that increases the quality of the classifier for that face-hash. That also helps if they show up a bit later with slightly different lighting.

I also implemented a "no more than 50 samples per matched face" to keep the size of the face-hash-db down.

https://hackaday.com/2015/03/04/face-recognition-for-your-ne...

and my old code's currently on gitlab, gitlab.com/crankylinuxuser . It's pretty crappy as it was a weekend hack. I need to separate the engine from the GUI, and make the GUI web accessible. There's a few more pieces to do that, but I was looking at selling it for various purposes.

link

rahimnathwani 3151 days ago

Awesome. Thanks for the clarification re the 50 sample limit. And thanks for sharing the code.

link

rahimnathwani 3155 days ago

"If I had a store, though, I would count someone walking past one way and then they other as two potential visits."

Correct. But if someone walks past really slowly, would you count that as multiple visits, because they appear in multiple frames?

'<SNIP> is quite a good proxy for "number of people present all day"'

The term to which I objected was 'Foot Traffic'. Whether something is a good proxy for <something other than foot traffic> is irrelevant.

link