|
> Then a brick hits you in the face when it dawns on you that all of our tools are dumping crazy amounts of non-relevant context into stdout thereby polluting your context windows. I've found that letting the agent write its own optimized script for dealing with some things can really help with this. Claude is now forbidden from using `gradlew` directly, and can only use a helper script we made. It clears, recompiles, publishes locally, tests, ... all with a few extra flags. And when a test fails, the stack trace is printed. Before this, Claude had to do A TON of different calls, all messing up the context. And when tests failed, it started to read gradle's generated HTML/XML files, which damaged the context immensely, since they contain a bunch of inline javascript. And I've also been implementing this "LLM=true"-like behaviour in most of my applications. When an LLM is using it, logging is less verbose, it's also deduplicated so it doesn't show the same line a hundred times, ... > He sees something goes wrong, but now he cut off the stacktraces by using tail, so he tries again using a bigger tail. Not satisfied with what he sees HE TRIES AGAIN with a bigger tail, and … you see the problem. It’s like a dog chasing its own tail. I've had the same issue. Claude was running the 5+ minute test suite MULTIPLE TIMES in succession, just with a different `| grep something` tacked at the end.
Now, the scripts I made always logs the entire (simplified) output, and just prints the path to the temporary file. This works so much better. |
I think my question at this point is what about this is specific to LLMs. Humans should not be forced to wade through reams of garbage output either.