Hacker News new | ask | show | jobs
We cut Claude's token usage 79% by redesigning our CLI for agents (infracost.io)
12 points by glenngillen 29 days ago
4 comments

co-founder of Infracost here, we launched Infracost on HN five years ago, when the CLI just generated cost estimates for Terraform. Earlier this year we were scoping a 1.0 release: the CLI would stop being just a cost-estimation tool and start surfacing the issues behind the costs: previous-generation instances, policy violations, the kinds of issues a thorough PR review would catch.

Then agent traffic started showing up, and it became clear the 1.0 scope was the right idea aimed at the wrong caller. A human reviewer reads a PR comment; an agent runs `infracost inspect --filter` ... and gets the same insight as a tabular row it can pipe into the next step. So we decided to skip our planned 1.0 release and go for 2.0, where we treated agents as a first-class citizen user of the CLI.

Along the way we picked up some interesting lessons on optimizing user token usage when designing a CLI, and we want to share them with the HN community since other CLI builders might benefit.

Why did you choose to introduce a --llm flag instead of simply detecting that the command is being run by an agent using a library such as @vercel/detect-agent and outputting the tool-formatted output automatically? We recently worked on optimizing our CLI for agents at Alpic and discovered that agents often forget to use the --json flag.
I'd treat the agent-facing output as an API, not just a display format. Once prompts and tools depend on it, a harmless CLI cleanup can break behavior the same way changing a JSON field would. The win here seems less about token count by itself and more about reducing inference from terminal decoration.
Designing interfaces specifically for agents (M2M DX) is a fascinating shift from traditional human-centric CLI design. We're moving from a world where "pretty" output and progress bars mattered to a world where raw, structured density is the goal. A 79% reduction is massive, but I wonder if we’ll see a new type of "Agent-Optimized" protocol emerge that completely bypasses the text-heavy nature of current CLIs. The overhead of an LLM trying to parse "human" terminal output is essentially a tax on every call.
I mostly agree with what you said (the diff being we've still done the "pretty" output and progress bars for the human-centric outputs). And I found it a fun exercise as trawling through the log output of the LLM tests to see why things were slow at times felt like watching a bunch of usability tests. Various approaches to solve problems that seem obvious in hindsight but not at all paths we'd optimised for. If you've ever done usability testing on something you've built you probably have a sense for what I'm talking about.

And yes, one of the outcomes of this was also ditching the human output for something more dense and LLM friendly.