Hacker News new | ask | show | jobs
by goldenarm 52 days ago
Comparison with Qwen3.6 35B A3B:

- GPQA Diamond: 47.4% vs 84.1% for Qwen

- HLE: 4.4% vs 20.2% for Qwen

- AA Omniscience Accuracy: 6.4% vs 18.9% for Qwen

- AA Hallucination Rate: 30.0% vs 50.3% for Qwen