My exact thoughts, especially because DeepseekV2 is meant to be a massive improvement.
It seems to be an emerging trend people should look out for that model release sheets often contain comparisons with out of date models and don't inform so much as just try to make the model look "best."
It's an annoying trend. Untrustworthy metrics betray untrustworthy morals.
They could compare to DeepSeek-Coder-V2-Lite-Instruct. That's a 16B model, and it comes out at 24.3 on LiveCodeBench. Given the size delta they're respectably close - they're only just behind at 23.4. The full V2 is way ahead.
It seems to be an emerging trend people should look out for that model release sheets often contain comparisons with out of date models and don't inform so much as just try to make the model look "best."
It's an annoying trend. Untrustworthy metrics betray untrustworthy morals.