| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chrislloyd 138 days ago
	Hi, this was my test! The plan-mode prompt has been largely unchanged since the 3.x series models and now 4.x get models are able to be successful with far less direction. My hypothesis was that shortening the plan would decrease rate-limit hits while helping people still achieve similar outcomes. I ran a few variants, with the author (and few thousand others) getting the most aggressive, limiting the plan to 40 lines. Early results aren't showing much impact on rate limits so I've ended the experiment. Planning serves two purposes - helping the model stay on track and helping the user gain confidence in what the model is about to do. Both sides of that are fuzzy, complex and non-obvious!

6 comments

nextzck 138 days ago

The 40-line cap not moving rate limits makes sense - plan text is cheap. The cost is in Phase 1 exploration.

Plan mode spins up to 3 explore subagents before the planner even starts, and the heuristic is "use multiple when scope is uncertain." It won't choose fewer - it's being asked to plan, so scope is always uncertain. Nothing penalizes claude for over-exploring and nothing rewards restraint.

Plan mode also ignores session state. A cold start gets the same fanout as a warm session where the relevant files are already in context. In a warm session the explore pass is pure waste - it re-reads loaded files and feeds the planner lossy summaries that conflict with what it already knows.

More tokens, worse plan.

If exploration was conditional on what's already in context..skip it for warm sessions, keep it for cold starts - that does more for both rate limits and plan quality than a hard 40-line cap.

Note: plan mode didn’t always have this 3 subagent fan out behavior attached to it, it was introduced around opus 4.6 launch.

link

okwhateverdude 138 days ago

How can we opt-out of these tests? The behavior foibles I've been experiencing over the past month might be directly attributable to these experiments! It can be extreme frustrating. I don't want to be in the beta channel. Please change this to be opt-in.

link

ramoz 138 days ago

Thanks for the transparency. Sorry for the noise.

I think I'd be okay with a smaller, more narrative-detailed plan - not so much about verbosity, more about me understanding what is about to happen & why. There hadn't been much discourse once planning mode entered (ie QA). It would jump into its own planning and idle until I saw only a set of projected code changes.

link

BAM-DevCrew 138 days ago

As a divergent thinker with extensive hard constraints in claude.mds and on-boarding commands that force claude to internalize my constraints, that you or some other employee of Anthropic could randomly select me for testing is horrifying. Each unexpected behavior and my corresponding reaction to it can wipe me out, my brain out, completely for hours, days, even weeks. I have in the last year spend tens (estimating around 400) of hours establishing and reestablishing a system to protect myself from psychological harm and financial harm. It is twisted that you Anthropic employees do not consider the impact your work has on divergent thinking Claude users, let alone that real work is severly impacted by your work. Totally irresponsible. Offensively so.

link

shepherdjerred 138 days ago

What?

Even without Anthropic's experimentation, anything in the context is completely probabilistic.

You cannot rely on it no matter how/how much you prompt the model

link

BAM-DevCrew 138 days ago

And how does one address the fragility of probabilities? Engineering. Study weaknesses and harden them. Control the probabilities. It is NOT "completely" probabilistic.

link

PufPufPuf 138 days ago

I can't tell whether something is satire anymore.

link

oakwhiz 138 days ago

Shouldn't you be giving people their tokens back when you used their tokens to test on their environment?

link

bartread 138 days ago

I don't mind you testing stuff out - it's the only sensible way to make the app better - but you need to give people choices to switch to different behaviours if the behaviour you're testing on them isn't working out well for them.

In other news, Claude Code login is down, so if you have time it would be sensible to proiritise fixing that:

Authorization failed Redirect URI http:/localhost:53025/callback is not supported by client.

MacOS Sequoia, VS Code 1.111.0, Firefox 147.0.4 (although also fails on Chrome 145.0.7632.160).

This just started happening as of this evening. I've tried restarting everything, and it doesn't help.

link