Anyone know how likely it is that this is the result of imbalanced training data? You have fewer dark-skinned people and children in the training data, you end up with a model less-skilled at detecting those people.
You'd have to compare the machine's performance to real human performance. I suspect humans also have an uneven detection rate between those categories.