| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by newswasboring 1149 days ago
	How is this model performing better than LLaMa in a lot of tasks[1] even though its trained on a fifth of the data (1 trillion vs 200 billion). [1]https://github.com/openlm-research/open_llama#evaluation

3 comments

YetAnotherNick 1149 days ago

They are likely doing some interpolation for 200B or benchmarking it in wrong way. e.g. Hellaswag accuracy for llama 7b is 0.76[1], but it is written 0.56 in the repo. Even at 200B tokens, it is higher than 0.56 for llama looking at the charts.

[1]: https://arxiv.org/pdf/2302.13971.pdf

link

byefruit 1149 days ago

They ran lm-evaluation-harness on both this model and the original llama weights, which is the correct way to do it.

Many people have been struggling to reproduce the benchmark numbers included in the original llama paper.

link

slekker 1149 days ago

Nobody knows :^)

link

tarruda 1149 days ago

Maybe it uses a higher quality dataset

link