|
|
|
|
|
by trjordan
14 hours ago
|
|
The core of the problem is that there are a million tools that make AI better, and no ways to measure whether AI is working better. Big companies with popular products have it. They do something between normal product analytics and chatbot evals to figure out if users are being successful in their sessions. That's the job. But any given dev, with between 3 and 50 sessions a day? Like, I have no idea what makes the LLM better. It's all vibes. My company has a whole stack here. Preferred harnesses, preferred models, skills, the shape of our code, everything. There's gotta be a way to measure whether this setup is working for us, at 1 / 1-million-th the scale of a Claude Code. |
|
What I do with my product is I explicity tell you to ask your agent. I have real world examples and real world repositories that you can try with:
https://gitsense.com
https://github.com/gitsense/smart-ripgrep
https://github.com/gitsense/smart-codex
Token saving on average is not what I am mostly interested in though. I am more interested in knowing that the AI doesn't load unnecessary files in context, which can affect reasoning.
You can just ask the agent after a task how many files do you think was not read by knowing the files purpose first?