| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lubujackson 55 days ago

There is a ton of optimization possible when we are able to observe how LLMs and agents process and navigate our code given different prompts. For example, our MCP was pulling down way too much data to resolve a simple "count rows" request. Once you see it, it's easy to resolve but I don't know of a good framework yet for walking through some of these patterns.

I built an eval framework to look just at tool calls given a static prompt, with the idea that LLMs should be able to deduce the best tool calls and arguments needed to get requested data. Not as great as full observability, but helpful for complex tool interactions. Anyone have any good tools for this problem?

In the same way we mentally walk through deterministic logic, SWEs need to learn to anticipate LLM context and tool awareness, which is much trickier to reason through, especially given the various LLM IDEs and how they manage context as a black box.