|
|
|
|
|
by hanselot
954 days ago
|
|
Isn't it a good thing that of the benchmarks they ran, the newer model has fewer of the answers memorized (aka, its parroting less)? Wouldn't this actually be exactly proof that the model has improved over its predecessor by having to solve the problem itself rather than rely on memory? What use is a model that memorizes the answers to all the benchmarks (see the 7b models on open llm leaderboard for more info on that). |
|