Hacker News new | ask | show | jobs
by nylonstrung 197 days ago
My experience with deepseek and Kimi is quite the opposite: smarter than benchmarks would imply

Whereas the benchmark gains seem by new OpenAI, Grok and Claude models don't feel accompanied by vibe improvement