Isn't that an issue? for instance if someone made a video 'out of your domain' (e.g. different model than the interal training example) how would the model perform? Would the AUC be impacted? what is the PPV? It seems common in these results that people are experiencing false positives, i did as well. if the percentage of fake news that we read is 10% and the model (auc + operating point on a test set unpublished) has 92% sens and spec we would still expect that ~50% of model positives are true negatives. If the "accuracy" is computed in an unblanaced dataset, what is to be taken from it ?
What happens is it essentially collapses as it requires a set of people to train the model. Meaning that set of people with their biases are training an AI to determine what is fake and what isn't.
Sounds like a pretty bad idea especially if they decide to be gatekeepers of factual articles. It requires the entire team to know their biases one way or another. Regardless if they think it's "right" or not.