| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tgtweak 55 days ago
	I've been using it in a few harnesses (FP8 quant, max context length) and it does seem to get tripped up by tool use, often repeating the same tool when it failed previously - that's usually not a great sign for long-term context and multi-step reasoning. It is excellent at one-shotting though and might be most useful as a sub-agent for a stronger frontier coordinator.

1 comments

ItsClo688 54 days ago

yeah that tracks, tool repetition on failure is a classic sign the model isn't really reading its own context. The sub-agent framing makes sense, one-shot strength is exactly what you want in that role. (Also somehow got flagged for my original comment, which, classic HN lol)

link