| As an engineer I find myself in this type of situation quite often - if anyone can point me to some good resources or has any advice, I'd be quite grateful: - Some non-technical stakeholder comes to me and says "can we solve this problem with Machine Learning?" usually it's something like "there need to be two supervisors on the factory floor at all times, and I want an email alert everytime there are less than 2 supervisors for more than 20 minutes" - I ask for some sample footage to build a prototype and get a few very poor quality videos, at a very different standard from what I see in most of these tutorials. - I find some pre-trained model that is able to do people detection or face detection and return bounding rectangles and download it in whatever form - After about 30 minutes of fiddling and googling errors, I run it against the sample footage - I get about 60% accuracy - this is no good. Where do I go from here? Keep trying different models? There are all sorts of models like YOLO and SSD and RetinaNet and YOLO2 and YOLO3. - At some point I try a bunch of models and all of them are at best 75% good. At this point I figure I should train it with my own dataset, and so I guess I need to arrange to have this stuff labelled. In my experience stakeholders are usually willing to appoint someone to do it but they want to know how much footage they need to label and whether their team will need special training to do the labelling and after it's all done is this even going to work? What are some effective / opinionated workflows for this part of the overall process that have worked well for you? What's a labelling tool that non-technical users can use intuitively? How good are tools/services like Mechanical Turk and Ground Truth? This part of the process costs time and money - stakeholders, particularly managers who are non-technical tend to want an answer beforehand - "If we spend all this time and money labelling footage, how well is this going to work? How much footage do we need to label?". How do you handle these kinds of conversations? I find this space fairly well-populated with ML tutorials and resources but haven't been able to find content that is focused on this part of the process. |
I believe your issue can be easily solved - have supervisors wear a distinctive color from a non-supervisor. For example let's say it's yellow.
OK so now you have yellow wearing supervisors and everyone else. To resolve the issue you have described acquire a month or so of footage, with labels per minute describing how many yellow wearing supervisors and how many people (in total) there are.
So the data you have is:
1. Yellow wearing supervisors
2. Total amount of workers on the floor
Then with this data you can train a network to do what you're describing pretty easily. Assuming there are a lot of workers on the floor, trying to do person detection or face detection would require too much data. Just have a uniform enforced and train on the colors/presence.