| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kiproping 22 days ago

This would be a better page to link to https://github.com/esengine/DeepSeek-Reasonix/blob/main/docs...

They explain some of the the reasons why they have a better solution and why they are very opinionated

>Automatic prefix caching activates only when the exact byte prefix of the previous request matches. Most agent loops reorder, rewrite, or inject fresh timestamps each turn — cache hit rate in practice: <20%.

So they optimize on this plus other techniques to improve cache hits, making it cheaper.

4 comments

sparkleMing 21 days ago

The last time I heard about something like this, it was Claude Code intentionally injecting random strings to break caching when you're not using a Claude model. Aside from that kind of intentional sabotage, I don't think any coding agent would just ignore prefix caching.

link

ikurei 21 days ago

I haven't heard about this, could you please share more info, some reference on that Claude Code intentional bug?

link

davesque 21 days ago

I'm not sure what the mechanism is, but I've definitely had Claude refuse to work on sessions that were touched by other models. Some kind of integrity check failure. Resetting the session back to the point before I used the other model fixed the problem.

link

benjamincburns 21 days ago

IIRC Anthropic's API produces cryptographic signatures for thinking blocks. If you try to submit a set of messages that include thinking blocks with missing/invalid signatures, it'll refuse.

They do this to mitigate jailbreak attempts that rely on fabricated message history (e.g. making it look like the model was compliant in previous messages, increasing the likelihood that it'll continue to be compliant in future messages).

link

sparkleMing 20 days ago

https://x.com/hqmank/status/2056205388689891834

link

krackers 22 days ago

>Most agent loops reorder, rewrite, or inject fresh timestamps each turn

That's really surprising, since it'd defeat the whole point of KV caching. I mean I buy it considering how sloppily coded the harnesses seem to be, but this like obvious low hanging fruit.

I've also often wondered why LLMs aren't trained with a format of having a dedicated contextual system-instruction role at the _end_, which you could use to put context like current time or other misc stuff.

link

benjamincburns 21 days ago

I don't think it's factually correct.

There are context pruning strategies that will prune old messages that are no longer relevant, and context compaction from summaries, etc. But to say "most" do this on "every turn" is overstating things. I think it's more correct to say that "many" do this "occasionally."

I'm also not sure what they mean about injecting fresh timestamps. I could see why you'd prepend/append a timestamp to the user's messages to make the model aware of the current time, and the passage of time, but I can't think of any good reason to edit timestamps in prior messages. I'm sure someone can come up with one, but I'd be very surprised if this was a thing that most agent loops do, let along doing it on every turn.

link

radio879 19 days ago

i put together this, for myself so i can try to track what coding agents are doing, I add agents to it or topics (like caching, or sandboxing, file editing methods, etc) just to try and find anything novel or good, since I am/was considering making a new harness but using all the best things from any of those. I still cannot find my perfect coding agent, every one of them has some problem or just not totally what it could be.

What I do is just point agents to a folder, have it loop around a few times on a repo, fact checks at the end, but people sometimes think the software/harness around the AI model doesn't do much which is TOTALLY wrong, its probably AS important or more.. file editing methods available matter a lot, context compaction methods... matter, caching matters. I am still fantasizing about a "best of N" coding agent, that tries to take all the best stuff from all of them.

I have an idea of a coding agent that puts a lot more effort into using more than one model at the same time. Sooo much can be done with that idea.. and no one is apparently doing it yet that I can find. I just am not sure I want to put that much time into a new coding agent project. I wonder how autonomous it could be - have weekly or daily scans of the current coding agent landscape and automatic scanning of coding agent/ai code related subreddits/hacker news, analyze it to figure out what the current problems are, complaints about existing coding agents, desires --> prioritized list of possible features/fixes ---> ai agents code and make releases

https://agents.buttonscli.com

link

jeremyjh 21 days ago

Its not surprising, that doc is full of AI slop.

link

embedding-shape 21 days ago

> Most agent loops reorder, rewrite, or inject fresh timestamps each turn

I haven't seen that, it'd be crazy slow if they did this. What "agent loops" are they talking about here specifically? The vagueness makes it sound potentially made up.

link

vidarh 21 days ago

I've never seen an agent loop "reorder, rewrite, or inject fresh timestamps" each turn other than mostly towards the end of the messages. Messing with a large part of the context every turn would be a fairly crazy thing to do.

link

nawitus 21 days ago

Yeah. Those claims are just some random AI slop from claude..

link

vidarh 21 days ago

It's a really lazy one too - there are so many open source harnesses, including e.g. Codex and Kimi-CLI, and of course the leaked Claude Code source, so it's trivial to verify if someone even just bothered to ask an agent to check actual source code examples.

link