Model makers (both open and closed weight) typically publish benchmarks against other models and when they do not, people rightfully call them out.
Including comparison against "other OSS engine" is just not helpful (what if it's a sandbagged baseline like HF Transformers?)
Model makers (both open and closed weight) typically publish benchmarks against other models and when they do not, people rightfully call them out.
Including comparison against "other OSS engine" is just not helpful (what if it's a sandbagged baseline like HF Transformers?)