Hacker News new | ask | show | jobs
by mufasachan 994 days ago
One explaination https://x.com/yampeleg/status/1707127722743325106?s=46&t=Cxa...

I would be curious what does he mean by "semi-automated system for detecting benchmark leaks. " though.

1 comments

AFAIK, such tests just feed the model chopped up bits of the evaluation data as raw strings with zero temperature. If it completes them verbatim, its probably in the training dataset.