| Rest assured that we are better at training models than naming them ;D - New benchmark SOTAs with 77.9% on SWE-Bench-Verified, 79.9% on SWE-Lancer, and 58.1% on TerminalBench 2.0 - Natively trained to work across many hours across multiple context windows via compaction - 30% more token-efficient at the same reasoning level across many tasks Let us know what you think! |