Hacker News new | ask | show | jobs
by brucethemoose2 996 days ago
AFAIK, such tests just feed the model chopped up bits of the evaluation data as raw strings with zero temperature. If it completes them verbatim, its probably in the training dataset.