| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by swyx 2 days ago
	i think <third party evals platform> will help us do that best on their standardized model matrix. for frontiercode’s launch we were focused on.. the frontier models

1 comments

VulgarExigency 2 days ago

What qualifies as a frontier model? From my personal "taste tests", I wouldn't have placed Sonnet or Kimi above Deepseek Pro or MiMo, or Gemini 3.1 Flash Lite above Deepseek Flash, but they're listed in the benchmark.

link