|
|
|
|
|
by riku_iki
1196 days ago
|
|
probably not if gap is so large: 7B vs 175B. Also, all those benchmark are trash because they can't track data leaks in training data. For example they trained llama on github, where GSM8k eval data is located, of course model will perform well on GSM8K, because it memorized answers. |
|
I’m sure there are issues similar to your description. Nevertheless, you seem to be a staunch defender of GPT-3, which to me indicates some kind of bias? Like, who cares if LLaMA is better - in fact, isn’t that a good indicator of progress?