I think saying "track record" is then factually wrong. Merriam Webster defines track record as: "a record of past performance often taken as an indicator of likely future performance". In that case, you should be able to point to the record.
I assume you will still stand behind the essence of your comment. In that case, it would be better to say "Based on my experience on playing with their models, I have strong reasons to believe that they continuously cheat on benchmarks by training on test datasets." You can then also add that this maps with what you hear from others in the field.
Hmm, even if the model performed poorly on real world task vs benchmark, it doesn't necessarily imply they train on the benchmarks themselves, right? They did the train, test split properly. Didn't train on the test. But the benchmark itself was bad at representing real world tasks? Is so, seems pretty wild to accuse a company of training on test data.. maybe this is "vibe commenting" and I'm just out of loop.