For R1-0528 specifically on evals - we're still running them :)) It's quite expensive to run, so we first do "vibe check" on some internal test cases, and they do pretty well!
But we generally stress the bug fixes that we do, which objectively increase performance by +1 to sometimes +10% accuracy - for example Llama 4 bug fixes, Gemma bug fixes - https://news.ycombinator.com/item?id=39671146 etc are much more important :)
We also provide Q8_0 and Q8_K_XL quants, which are mostly equivalent to FP8 - you can also use the magical `-ot ".ffn_.*_exps.=CPU"` incantation to offload MoE layers to RAM!
> All Distilled and the original R1 versions seem to have accidentally assigned the padding token to <|endofsentence|>, which is mostly not a good idea, especially if you want to further finetune on top of these reasoning models. This will cause endless infinite generations, since most frameworks will mask the EOS token out as -100.
I couldn't tell if this was an error in the code running the model or in the model weights themselves; if/assuming the former, are these fixes being upstreamed to anywhere?
I couldn't tell if this was an error in the code running the model or in the model weights themselves; if/assuming the former, are these fixes being upstreamed to anywhere?