| The harness is really important. It matters so much - possibly even more than the model. We had harness crashes after running many agents - granted we were doing quite a bit with it. Grok Build (as a product) review here: UX friction behind is worse than Claude Code - but seems to be a strange positioning choice - they're more on the 'vibe' side than the 'agentic engineering' things. Largest issue was actually reviewing output - but if you're going to largely make that opaque from the user, why choose a CLI-based interface that's so mouse-heavy? There's also problems with the actual model. Thinking is visible, and every interaction goes like this: "I would like you to investigate adding an API route to tackle x,y,z"
*Grok, thinking: Okay - the user has asked me to add an API route to tackle x,y,z" Also absolutely absurd other quirks - "I have no tools available in my context" being visible in the CoT. The auto-approval (yellow, auto-mode) review of Claude Code via Opus is a killer feature - every build-it CLI should be offering this for long horizon tasks. Messaged one of the engineers about our experience - no feedback. You'd be better off with Claude Code 5x Max than the 300 USD/month subscription. |