| > What's even more suspicious is that these tweets from Elliot Glazer indicate that they are still "developing" the hold-out set, There is nothing suspicious about this and the wording seems to be incorrect. A hold-out set is a percentage of the overall data that is used to test a model. It is just not trained on it. Model developers normally have full access to it. There is nothing inherently wrong with training on a full/partial hold out set. It just means you have done a different split to train again. The confusion I see here is that people are equating a hold out set to a blind set. That's a set of data to test against that the model developers (and model) cannot see. Even so blind sets can also go stale after a few runs and nothing is wrong with ingesting that blind set, as long as you have a new blind set to run against. Trying to game blind set tests is nothing new and it gets very quickly found out. What I took from the original article is that the blind set is likely unbalanced and it answered more easier questions than hard ones. |
What on earth? This is from Tamay Besiroglu at Epoch:
So this "confusion" is because Epoch AI specifically told people it was a blind set! Despite the condescending tone, your comment is just plain wrong.