| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lackoftactics 1 day ago
	Author of the text here. I will be honest with why I wrote it, the rtk ai looks very odd to me as software engineer, the number of stars, no mention of accuracy and how management is pushing that stuff to optimize costs. Now people are wrapping every possible command in rtk and trying to handle all major possible commands and decide which output you should get.

2 comments

jahala 1 day ago

Would sincerely love to hear your thoughts on https://www.github.com/jahala/tilth - it’s a different approach than RTK, benchmarked to reduce cost per correct answer by ~40%

link

lackoftactics 1 day ago

Looked at your repo, even starred. On the surface, I like your approach a bit better. It looks like your idea sits at the space between semantic search and compressing tokens. I was into semantic search before, but mostly trying to vectorize codebase instead of tree sitter and couldn’t make the semantic search work for me. Thanks for sharing!

link

jvican 1 day ago

An ex colleague is working on Headroom, a much more legit alternative to RTK. They provide accuracy benchmarks in the repo and are transparent about the compression algorithms used for the different output types. I liked their approach a lot better than RTK and thought it might be relevant for you.

https://github.com/chopratejas/headroom

link

baq 1 day ago

This thread is gold, looks like setting up a combination of both tools could reduce token consumption by 50% essentially doubling the subscription? Will be testing this out after morning coffee for sure

link

oxavier 17 hours ago

Headroom uses RTK under the hood.

I applaud the benchmarking though.

link

Zababa 1 day ago

That's already better than RTK because you measure task accuracy AND savings! So I'm more confident in this one than in the RTK/caveman/ponytail stuff.

There are still two things that bother me:

1) I don't really know when tilth is called, how it works kind of. Does the model itself select it when it needs it? Do you need to instruct the model to use it?

2) If the model itself chooses to use it, I'd like to have a benchmark of non regressions on tasks where tilth isn't helping, to ensure you made the model + harness + tools as a whole better rather than more specialized ; or be upfront about more specialized and have more details when to use/when to not use.

link

polski-g 1 day ago

Very cool. I'll probably switch away from AFT to this. Can you add tree-sitter-bash?

link

ianwalter 1 day ago

Why didn’t you offer any real world usage numbers to illustrate your point? I found this unhelpful.

link

lloyd-christmas 1 day ago

I read another post oddly similar earlier today that has more explicit data on that authors codebase: https://codepointer.substack.com/p/cutting-llm-token-costs-w...

TLDR; ~3-4% savings to actual API costs with rtk, caveman, and headroom combined, but nothing tangible on if those cost reductions came at a cost of quality. By their calculations, rtk saved them $4.96 on a $926 bill.

link

bcollins34 1 day ago

^recommend reading this one

link

fumeux_fume 1 day ago

https://en.wikipedia.org/wiki/Brandolini%27s_law

link

lackoftactics 1 day ago

Thanks for the link! I thought I know every obscure law and I was so wrong.

link

lackoftactics 1 day ago

That’s the fair point. The rtk promotional posts point to 60-90% tokens savings and there is no mention how they perform accuracy wise. The commenter below did great job pointing to resource showing caveman, rtk saving just couple bucks on $926 bill. Thanks, Llyoyd Christmas for linking to useful substack

link