| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by btown 77 days ago

It seems the benchmarks here are heavily biased towards single-shot explanatory tasks, not agentic loops where code is generated: https://github.com/drona23/claude-token-efficient/blob/main/...

And I think this raises a really important question. When you're deep into a project that's iterating on a live codebase, does Claude's default verbosity, where it's allowed to expound on why it's doing what it's doing when it's writing massive files, allow the session to remain more coherent and focused as context size grows? And in doing so, does it save overall tokens by making better, more grounded decisions?

The original link here has one rule that says: "No redundant context. Do not repeat information already established in the session." To me, I want more of that. That's goal-oriented quasi-reasoning tokens that I do want it to emit, visualize, and use, that very possibly keep it from getting "lost in the sauce."

By all means, use this in environments where output tokens are expensive, and you're processing lots of data in parallel. But I'm not sure there's good data on this approach being effective for agentic coding.

8 comments

sillysaurusx 77 days ago

I wrote a skill called /handoff. Whenever a session is nearing a compaction limit or has served its usefulness, it generates and commits a markdown file explaining everything it did or talked about. It’s called /handoff because you do it before a compaction. (“Isn’t that what compaction is for?” Yes, but those go away. This is like a permanent record of compacted sessions.)

I don’t know if it helps maintain long term coherency, but my sessions do occasionally reference those docs. More than that, it’s an excellent “daily report” type system where you can give visibility to your manager (and your future self) on what you did and why.

Point being, it might be better to distill that long term cohesion into a verbose markdown file, so that you and your future sessions can read it as needed. A lot of the context is trying stuff and figuring out the problem to solve, which can be documented much more concisely than wanting it to fill up your context window.

EDIT: Someone asked for installation steps, so I posted it here: https://news.ycombinator.com/item?id=47581936

dataviz1000 77 days ago

Did you call it '/handoff' or did Claude name it that? The reason I'm asking is because I noticed a pattern with Claude subtly influencing me. For example, the first time I heard the the word 'gate' was from Claude and 1 week later I hear it everywhere including on Hacker News. I didn't use the word 'handoff' but Claude creates handoff files also [0]. I was thinking about this all day. Because Claude didn't just use the word 'gate' it created an entire system around it that includes handoffs that I'm starting to see everywhere. This might mean Claude is very quietly leading and influencing us in a direction.

[0] https://github.com/search?q=repo%3Aadam-s%2Fintercept%20hand...

sillysaurusx 77 days ago

I was reading through the Claude docs and it was talking about common patterns to preserve context across sessions. One pattern was a "handoff file", which they explained like "have claude save a summary of the current session into a handoff file, start a new session, then tell it to read the file."

That sounded like a nice idea, so I made it effortless beyond typing /handoff.

The generated docs turned out to be really handy for me personally, so I kept using it, and committed them into my project as they're generated.

dataviz1000 77 days ago

Oh, so the word 'gate' is probably in the documentation also!

I see. So this isn't as scary. Claude is helping me understand how to use it properly.

nerdsniper 76 days ago

I have noticed similar phenomena with Claude, where its vocabulary subtly shifts how I think/frame/write about things or points me to subtle gaps in my own understanding. And I also usually come around to understand that it's often not arbitrary. But I do think some confirmation bias is at play: when it tries to shift me into the wrong directions repeatedly, I learn how to make it stop doing that.

It definitely adds a layer of cognitive load, in wrangling/shepherding/accomodating/accepting the unpredictable personalities and stochastic behaviors of the agents. It has strong default behaviors for certain small tasks, and where humans would eventually habituate prescribed procedures/requirements, the LLM's never really internalize my preferences. In that way, they are more like contractors than employees.

perching_aix 76 days ago

If this was more than just a gut reaction [0], I have a tough time navigating what swings this topic between scary and not scary for you.

Unless you're a true and invested believer of souls, free will, and other spiritualistic nonsense (or have a vested political affiliation to pretend so), it should be tautological that everything you read and experience biases you. LLM output then is no different.

If you are a believer, then either nothing ever did, or LLMs are special in some way, or everything else is. Which just doesn't make sense to me.

[0] It's jarring to observe the boundaries of one's agency, sure, but LLMs are really nothing special in this way. For example, I somewhat frequently catch myself using words and phrases I saw earlier during the day elsewhere, even if I did not process them consciously.

airstrike 77 days ago

Why would it be scary? Claude is just parroting other human knowledge. It has no goal or agency.

adrianN 77 days ago

You can’t verify that there is no influence by the makers of Claude.

fwipsy 77 days ago

By that logic, nothing computers do is scary.

jstanley 76 days ago

FWIW I have worked with people using the word "gate" for years.

For example, "let's gate the new logic behind a feature flag".

ProofHouse 76 days ago

They all are. This is proven in research. https://medium.com/data-science-collective/the-ai-hivemind-p...

reedlaw 76 days ago

Claude has trained me on the use of the word 'invariant'. I never used it before, but it makes sense as a term for a rule the system guarantees. I would have used 'validation' for application-side rules or 'constraint' for db rules, but 'invariant' is a nice generic substitute.

creamyhorror 76 days ago

I've started saying "gate" and "bound(ed)" and "handoff" a lot (and even "seam" and "key off" sometimes) since Codex keeps using the terms. They're useful, no doubt, but AI definitely seems to prefer using them.

flashgordon 77 days ago

I've actually been doing this for a year. I call it /checkpoint instead and it does some thing like:

* update our architecture.md and other key md files in folders affected by updates and learnings in this session. * update claude.md with changes in workflows/tooling/conventions (not project summaries) * commit

It's been pretty good so far. Nothing fancy. Recently I also asked to keep memories within the repo itself instead of in ~/.claude.

Only downside is it is slow but keeps enough to pass the baton. May be "handoff" would have been a better name!

tstrimple 76 days ago

I've got something similar but I call them threads. I work with a number of different contexts and my context discipline is bad so I needed a way to hand off work planned on one context but needs to be executed from another. I wanted a little bit of order to the chaos, so my threads skill will add and search issues created in my local forgejo repo. Gives me a convenient way to explicitly save session state to be picked up later.

I've got a separate script which parses the jsonl files that claude creates for sessions and indexes them in a local database for longer term searchability. A number of times I've found myself needing some detail I knew existed in some conversation history, but CC is pretty bad and slow at searching through the flat files for relevant content. This makes that process much faster and more consistent. Again, this is due to my lack of discipline with contexts. I'll be working with my recipe planner context and have a random idea that I just iterate with right there. Later I'll never remember that idea started from the recipe context. With this setup I don't have to.

chermi 77 days ago

Did the same. Although I'm considering a pipeline where sessions are periodically translated to .md with most tool outputs and other junk stripped and using that as source to query against for context. I am testing out a semi-continuous ingestion of it in to my rag/knowledge db.

mlrtime 76 days ago

Wouldn't the next phase of this be automatic handoffs executed with hooks?

Your system is great and I do similar, my problem is I have a bunch of sessions and forget to 'handoff'.

The clawbots handle this automatically with journals to save knowledge/memory.

dominotw 76 days ago

when work on task i have task/{name}.md that write a running log to. is this not a common workflow?

david_allison 77 days ago

Is this available online? I'd love documentation of my prompts.

sillysaurusx 77 days ago

I’ll post it here, one minute.

Ok, here you go: https://gist.github.com/shawwn/56d9f2e3f8f662825c977e6e5d0bf...

Installation steps:

- In your project, download https://gist.github.com/shawwn/56d9f2e3f8f662825c977e6e5d0bf... into .claude/commands/handoff.md

- In your project's CLAUDE.md file, put "Read `docs/agents/handoff/*.md` for context."

Usage:

- Whenever you've finished a feature, done a coherent "thing", or otherwise want to document all the stuff that's in your current session, type /handoff. It'll generate a file named e.g. docs/agents/handoff/2026-03-30-001-whatever-you-did.md. It'll ask you if you like the name, and you can say "yes" or "yes, and make sure you go into detail about X" or whatever else you want the handoff to specifically include info about.

- Optionally, type "/rename 2026-03-23-001-whatever-you-did" into claude, followed by "/exit" and then "claude" to re-open a fresh session. (You can resume the previous session with "claude 2026-03-23-001-whatever-you-did". On the other hand, I've never actually needed to resume a previous session, so you could just ignore this step entirely; just /exit then type claude.)

Here's an example so you can see why I like the system. I was working on a little blockchain visualizer. At the end of the session I typed /handoff, and this was the result:

- docs/agents/handoff/2026-03-24-001-brownie-viz-graph-interactivity.md: https://gist.github.com/shawwn/29ed856d020a0131830aec6b3bc29...

The filename convention stuff was just personal preference. You can tell it to store the docs however you want to. I just like date-prefixed names because it gives a nice history of what I've done. https://github.com/user-attachments/assets/5a79b929-49ee-461...

Try to do a /handoff before your conversation gets compacted, not after. The whole point is to be a permanent record of key decisions from your session. Claude's compaction theoretically preserves all of these details, so /handoff will still work after a compaction, but it might not be as detailed as it otherwise would have been.

creamyhorror 76 days ago

I already do this manually each time I finish some work/investigation (I literally just say

"write a summary handoff md in ./planning for a fresh convo"

and it's generally good enough), but maybe a skill like you've done would save some typing, hmm

My ./planning directory is getting pretty big, though!

addandsubtract 76 days ago

Thanks! The last link is broken, though, or maybe you didn't mean to include it? Also, if you've never actually resumed a session, do you use these docs at some other time? Do you reference them when working on a related feature, or just keep them for keepsake to track what you've done and why?

sillysaurusx 76 days ago

Thank you. It was just a screenshot of my handoff directory. I originally tried to upload to imgur but got attacked by ads, then uploaded to github via “new issue” pasting. I thought such screenshots were stable, but looks like GitHub prunes those now.

It wasn’t anything important. I appreciate you pointing that out though.

I just keep old sessions for keepsake. No reason really. I thought maybe I’d want them for some reason but never did.

The docs are the important part. It helps me (and future sessions) understand old decisions.

david_allison 77 days ago

Oh wow, thank you so much!!!!!

cruffle_duffle 77 days ago

Thanks!!!

DeathArrow 77 days ago

I think Cursor does something similar under the hood.

alsetmusic 77 days ago

> No explaining what you are about to do. Just do it.

Came here for the same reason.

I can't calculate how many times this exact section of Claude output let me know that it was doing the wrong thing so I could abort and refine my prompt.

hatmanstack 77 days ago

Seems crazy to me people aren't already including rules to prevent useless language in their system/project lvl CLAUDE.md.

As far as redundancy...it's quite useful according to recent research. Pulled from Gemini 3.1 "two main paradigms: generating redundant reasoning paths (self-consistency) and aggregating outputs from redundant models (ensembling)." Both have fresh papers written about their benefits.

wongarsu 76 days ago

There was also that one paper that had very noticeable benchmark improvements in non-thinking models by just writing the prompt twice. The same paper remarked how thinking models often repeat the relevant parts of the prompt, achieving the same effect.

Claude is already pretty light on flourishes in its answers, at least compared to most other SotA models. And for everything else it's not at all obvious to me which parts are useless. And benchmarking it is hard (as evidenced by this thread). I'd rather spend my time on something else

whattheheckheck 77 days ago

No such thing as junk DNA kinda applies here

scosman 77 days ago

also: inference time scaling. Generating more tokens when getting to an answer helps produce better answers.

Not all extra tokens help, but optimizing for minimal length when the model was RL'd on task performance seems detrimental.

joquarky 76 days ago

I liked playing with the completion models (davinci 2/3). It was a challenge to arrange a scenario for it to complete in a way that gave me the information I wanted.

That was how I realized why the chat interfaces like to start with all that seemingly unnecessary/redundant text.

It basically seeds a document/dialogue for it to complete, so if you make it start out terse, then it will be less likely to get the right nuance for the rest of the inference.

dataviz1000 76 days ago

I made a test [0] which runs several different configurations against coding tasks from easy to hard. There is a test which it has to pass. Because of temperature, the number of tokens per one shot vary widely with all the different configurations include this one. However, across 30 tests, this does perform worse.

[0] https://github.com/adam-s/testing-claude-agent

btown 75 days ago

This is an amazing analysis! Thank you for running this :)

matchagaucho 76 days ago

Some redundancy also helps to keep a running todo list on the context tip, in the event of compacting or truncation.

Distilled mini/nano models need regular reminders about their objectives.

As documented by Manus https://manus.im/blog/Context-Engineering-for-AI-Agents-Less...

0xbadcafebee 76 days ago

There's an ancient paper that shows repetition improves non-reasoning weights: https://arxiv.org/html/2512.14982v1

baq 76 days ago

if the model gets dumber as its context window is filled, any way of compressing the context in a lossless fashion should give a multiplicative gain in the 50% METR horizon on your tasks as you'll simply get more done before the collapse. (at least in the spherical cow^Wtask model, anyway.)