Hacker News new | ask | show | jobs
by tgtweak 55 days ago
I've been using it in a few harnesses (FP8 quant, max context length) and it does seem to get tripped up by tool use, often repeating the same tool when it failed previously - that's usually not a great sign for long-term context and multi-step reasoning. It is excellent at one-shotting though and might be most useful as a sub-agent for a stronger frontier coordinator.
1 comments

yeah that tracks, tool repetition on failure is a classic sign the model isn't really reading its own context. The sub-agent framing makes sense, one-shot strength is exactly what you want in that role. (Also somehow got flagged for my original comment, which, classic HN lol)