It also can't really be overstated how helpful it is as an ML engineer to simply spend the time going through thousands of examples yourself. If you abstract yourself away from the data and just "make metric go up" you'll be missing out on valuable insights about how and why your model might be failing.
It's almost as if (bear with me ...) these "artificial intelligences" actually need "human intelligences" to guide them. Maybe we can think up a "system" where "experts" can codify rules for the "artificial intelligence" to follow.
Ok the sarcasm got too thick but my point is if the engineer has to spend the time to comb thousands of examples then you don't have AI you have a man in a box pretending to be a machine that plays chess.
For my one foray into ML, in 2020, I also built my own labeling system. It was stupidly simple; IIRC, it was a Jupyter Notebook that presented you with text to label, and you’d do so by hitting 1-5, which were mapped to sentiments / emotions. If you got bored, or just wanted to see how it performed with X% training, you could save progress and quit. It worked well enough, and I think I labeled a couple of thousand entries using it.
I ALSO have resorted to building my own labeling even though there are great generic labeling tools out there. I think this is a missing piece of the landscape but I don't know enough about the space yet to say what the solution should be.