Uneducated guess--somewhere someone will have archived the discussion--but I wonder if the technical complexity was also a factor. If you've made the combining characters for gender, and you've designated which characters can combine with them, but you didn't plan for characters that must be combined only in certain ways, then you're looking at having to specify all that out, and then implementers will have do their thing, and it just... wasn't worth it.
But that's not the system. They should be sticking to consistently using the ZWJ + gender to modify a base emoji and they aren't. Consuming three codepoints when one will do is silly.
Their guideline: https://unicode.org/emoji/proposals.html#selection_factors_i...
For instance, they suggest using "elephant" to evaluate the usage:
https://trends.google.com/trends/explore?date=all_2008&gprop...