| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by compuficial 20 hours ago

> 1. Gamified Savings vs. Your Actual API Bill

Tool use output represents a large amount of my output. I'll take 3.7M tokens saved on 3.9M tokens of input. Tokens saved are tokens saved.

> 3. Where Are the Accuracy Benchmarks?

As a user of RTK, it would be nice to see accuracy benchmarks. However, I've seen no evidence of the model missing anything critical as a result of the compression. As part of their design philosophy they are very strict about preserving correctness to the point that if a filter fails they fall back to raw output. For my most frequently used commands I've inspected the source, was happy with what I saw, they've earned my trust thus far.

> The day git, cargo, npm, or grep updates its terminal formatting by a few spaces or changes an error layout, RTK's regex and parsing filters will break. And returning to the silent failure trap, it won't throw an explicit error; it will fail quietly, feeding corrupted or partial text to your agent.

Again, any filter that fails simply falls back to the raw output. One of their core pillars is avoiding this exact scenario you described. RTK should never feed corrupted or partial text to an agent.

Your concerns are fair but I'd like to see your criticism backed up with evidence. Have you used RTK? Have you found evidence that they are failing to preserve correctness?

3 comments

lackoftactics 20 hours ago

I was looking through the issues as investigation. Some issues that caught my attention are looking quite bad https://github.com/rtk-ai/rtk/issues/2494 https://github.com/rtk-ai/rtk/issues/2462 https://github.com/rtk-ai/rtk/issues/2395

link

compuficial 20 hours ago

Fwiw, I just ran the steps to reproduce and got `Error: prettier produced no output` on rtk (0.42.2). Not saying this isn't valid for the users environment but I could not reproduce on linux.

link

lackoftactics 19 hours ago

appreciate the engineering effort and skin in the game. I might try on macos today as the author of issue.

link

Sayrus 17 hours ago

> Tokens saved are tokens saved.

Not always. RTK strips flags and other information. Sometimes you spend more tokens getting them back later. Sure your saved 70% tokens on that tool call, but nothing in the metrics says whether you ran 3 tool calls instead of 1.

There is also a question of whether that stripped output requires more thinking tokens or not.

link

Zababa 5 hours ago

I don't think being very strict about preserving correctness is enough. Considering the cost differences between the latest model and an open weight one that's behind, or between the biggest model and the one below it, I think you have to measure performance very carefully.

Rather than the criticism needing to be backed up with evidence, it's up to RTK to prove they don't degrade performance.

link