| There are a load of questions here. > Where do I go from here? Keep trying different models? > ...after [the labeling is] all done is this even going to work? > [How to label] > If we spend all this time and money labelling footage, how well is this going to work? How much footage do we need to label? Generally, you're discussing the space of model improvement and refinement. This is the costliest and most dangerous part of any ML pipeline. Without good evaluation, stakeholder support, and real reason to believe that the algorithm can be improved this is just a hole to throw money into. The short answer to most questions is that you don't really know. Generally speaking, more data will improve ML algorithm performance, especially if that data is more specific to your problem. That said, more data may not actually substantially improve performance. You will get much more leverage by using existing systems, accepting whatever error rate you receive, and building systems and processes around these tools to play to their strengths. People have suggested asking the floor managers to wear a certain color. You could also use the probabilistic bounds implied by the accuracies you're seeing to build a system which doesn't replace manual monitoring, but augments it. Perhaps you can emit a warning when there's a likelihood exceeding some threshold that there aren't enough people on the floor. This makes it easier for the person monitoring manually, catches the worst case scenarios, and helps improve the accuracy of the entire monitoring system. Not only can these systems be implemented more cheaply, they will provide early wins for your stakeholders and provide groundwork for a case to invest in the actual ML. They might also reduce the problem space that you're working in to a place where you can judge accuracy better and build theories about why the models might be underperforming. This will support experiments to try out new models, augment the system with other models, or even try to fine-tune or improve the models themselves for your particular situation. In terms of software development lifecycles, it's relatively late in the game when you can afford the often nearly bottomless investment of "machine learning research". Early stages should just implement existing, simple models with minimal variation and work on refining the problem such that bigger tools can be supported down the line if the value is there. |
It has been challenging communicating many of these realities to non-technical folks, who seem to be quite misguided about implementing these types of systems as opposed to "non-ML" systems where there is a less imperfect and more predictable idea of what's possible, how well it will work, and how much effort is required to pull it off.