i think <third party evals platform> will help us do that best on their standardized model matrix. for frontiercode’s launch we were focused on.. the frontier models
What qualifies as a frontier model? From my personal "taste tests", I wouldn't have placed Sonnet or Kimi above Deepseek Pro or MiMo, or Gemini 3.1 Flash Lite above Deepseek Flash, but they're listed in the benchmark.