| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by riku_iki 968 days ago
	I think you can bring contamination claim to every public benchmark results nowdays: models are trained on TBs of data crawled from internet, and there is no guarantee benchmark is not leaked in some way.

1 comments

spmurrayzzz 968 days ago

With respect to the pretraining data, its true that we're probably SOL there in terms of verification. But for fine-tuning, they could still publish the dataset and see if others can reproduce their results as well as audit for contamination.

If we're comparing benchmark deltas between different fine-tuned variants that share the same base models, that seems like the bare minimum we should expect to come along with performance claims.

link

riku_iki 968 days ago

I think both pretraining and finetuning datas are essential secret information for commercial models/services.

link

spmurrayzzz 968 days ago

In the case of Phind though, they also publish their models on HF with similar bold performance claims without publishing the datasets: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2

Even I am to grant that their subscription product has some secret sauce they want to keep close to the chest (ignoring for a moment their paid product is GPT-4 based), not doing the same for all the models they release to the open source community free of charge with a commercially-permissible license seems suspect.

I realize this sort of open source contribution is mostly for marketing purposes, but being critical of the performance claims I think is still valid nonetheless.

link