Hacker News new | ask | show | jobs
by lackoftactics 1 day ago
Author of the text here. I will be honest with why I wrote it, the rtk ai looks very odd to me as software engineer, the number of stars, no mention of accuracy and how management is pushing that stuff to optimize costs. Now people are wrapping every possible command in rtk and trying to handle all major possible commands and decide which output you should get.
2 comments

Would sincerely love to hear your thoughts on https://www.github.com/jahala/tilth - it’s a different approach than RTK, benchmarked to reduce cost per correct answer by ~40%
Looked at your repo, even starred. On the surface, I like your approach a bit better. It looks like your idea sits at the space between semantic search and compressing tokens. I was into semantic search before, but mostly trying to vectorize codebase instead of tree sitter and couldn’t make the semantic search work for me. Thanks for sharing!
An ex colleague is working on Headroom, a much more legit alternative to RTK. They provide accuracy benchmarks in the repo and are transparent about the compression algorithms used for the different output types. I liked their approach a lot better than RTK and thought it might be relevant for you.

https://github.com/chopratejas/headroom

This thread is gold, looks like setting up a combination of both tools could reduce token consumption by 50% essentially doubling the subscription? Will be testing this out after morning coffee for sure
Headroom uses RTK under the hood.

I applaud the benchmarking though.

That's already better than RTK because you measure task accuracy AND savings! So I'm more confident in this one than in the RTK/caveman/ponytail stuff.

There are still two things that bother me:

1) I don't really know when tilth is called, how it works kind of. Does the model itself select it when it needs it? Do you need to instruct the model to use it?

2) If the model itself chooses to use it, I'd like to have a benchmark of non regressions on tasks where tilth isn't helping, to ensure you made the model + harness + tools as a whole better rather than more specialized ; or be upfront about more specialized and have more details when to use/when to not use.

Very cool. I'll probably switch away from AFT to this. Can you add tree-sitter-bash?
Why didn’t you offer any real world usage numbers to illustrate your point? I found this unhelpful.
I read another post oddly similar earlier today that has more explicit data on that authors codebase: https://codepointer.substack.com/p/cutting-llm-token-costs-w...

TLDR; ~3-4% savings to actual API costs with rtk, caveman, and headroom combined, but nothing tangible on if those cost reductions came at a cost of quality. By their calculations, rtk saved them $4.96 on a $926 bill.

^recommend reading this one
Thanks for the link! I thought I know every obscure law and I was so wrong.
That’s the fair point. The rtk promotional posts point to 60-90% tokens savings and there is no mention how they perform accuracy wise. The commenter below did great job pointing to resource showing caveman, rtk saving just couple bucks on $926 bill. Thanks, Llyoyd Christmas for linking to useful substack