Haha, I am sorry. I spit my coffee reading this. It is ofc totally OK to not know what ground truth means but the irony was to funny. Yes ground truth will always be superior compared to anything else :)!
Ground truth will always be superior on the "does this match the ground truth?" metric, but that's often just a proxy for output quality and the model will be judged differently once deployed (e.g. "do human users like this?")
That's something to be aware of, especially when you're using convenience data of unknown quality to evaluate your model – many research datasets scraped off the internet with little curation and labeled in a rush by low-paid workers contain a lot of SEO garbage and labeling errors.
I always wanted to meet the team behind Ground Truth. It’s truly remarkable what they have built. Every time AI models show up, these guys outperform them on every metric.
Anyone have any contacts? They seem to be extremely elusive
That's something to be aware of, especially when you're using convenience data of unknown quality to evaluate your model – many research datasets scraped off the internet with little curation and labeled in a rush by low-paid workers contain a lot of SEO garbage and labeling errors.