The elo ranking [1] is too good to be true. I don't know why gemma-4-26b-a4b performs better than gemma-4-31b.
Also waiting for more bugfixes in llama.cpp, sglang and vllm to do proper evaluations.
[1] https://arena.ai/leaderboard/text/expert?license=open-source