|
|
|
|
|
by YetAnotherNick
1144 days ago
|
|
They are likely doing some interpolation for 200B or benchmarking it in wrong way. e.g. Hellaswag accuracy for llama 7b is 0.76[1], but it is written 0.56 in the repo. Even at 200B tokens, it is higher than 0.56 for llama looking at the charts. [1]: https://arxiv.org/pdf/2302.13971.pdf |
|
Many people have been struggling to reproduce the benchmark numbers included in the original llama paper.