Hacker News new | ask | show | jobs
by hanselot 954 days ago
Isn't it a good thing that of the benchmarks they ran, the newer model has fewer of the answers memorized (aka, its parroting less)?

Wouldn't this actually be exactly proof that the model has improved over its predecessor by having to solve the problem itself rather than rely on memory?

What use is a model that memorizes the answers to all the benchmarks (see the 7b models on open llm leaderboard for more info on that).