| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bcherny 56 days ago
	Hey, Boris from the team here. We did both -- we did a number of UI iterations (eg. improving thinking loading states, making it more clear how many tokens are being downloaded, etc.). But we also reduced the default effort level after evals and dogfooding. The latter was not the right decision, so we rolled it back after finding that UX iterations were insufficient (people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this).

5 comments

big_toast 56 days ago

Having a "Recovery Mode"/"Safe Boot" flag to disable our configurations (or progressively enable) to see how claude code responds would be nice. Sometimes I get worried some old flag I set is breaking things. Maybe the flag already exists? I tried Claude doctor but it wasn't quite the solution.

For instance:

Is Haiku supposed to hit a warm system-prompt cache in a default Claude code setup?

I had `DISABLE_TELEMETRY=1` in my env and found the haiku requests would not hit a warm-cached system prompt. E.g. on first request just now w/ most recent version (v2.1.118, but happened on others):

w/ telemetry off - input_tokens:10 cache_read:0 cache_write:28897 out:249

w/ telemetry on - input_tokens:10 cache_read:24344 cache_write:7237 out:243

I used to think having so many users was leading to people hitting a lot of edge cases, 3 million users is 3 million different problems. Everyone can't be on the happy path. But then I started hitting weird edge cases and started thinking the permutations might not be under control.

link

EugeneOZ 56 days ago

> people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this

UI is UI. It is naive to expect that you build some UI but users will "just magically" find out that they should use it as a terminal in the first place.

link

taytus 56 days ago

“after evals and dogfooding” couldn’t have done this before releasing the model? We are paying $200/month to beta test the software for you.

link

abtinf 56 days ago

You didn’t anticipate most people stick with defaults?

link

bcherny 55 days ago

We anticipated the default would be the best option for most people. We were wrong, so we reverted the default.

link

troupo 55 days ago

It took you a month to revert after multiple complaints. You still blamed users for using the product exactly as you advertised it. And all of your official channels were completely quite for two months, whether it was about new draconian peak hour limits, or about the new defaults, or about exponentially increasing token costs.

People literally started seeing issues immediately as you changed the defaults: https://x.com/levelsio/status/2029307862493618290 And despite a huge amount of reports you still kept it for a whole month.

And then you shipped a completely untested feature with prompt cache misses and literally gaslit users and blamed users for using the product as advertised.

Oh. Remember this https://x.com/bcherny/status/2024152178273989085? "We move fast but test carefully"?

Now untold umber of people have been hit by these changes, so as an apology you reset usage limits three hours before they would reset anyway.

Good job.

Edit. By the way, a very telling sentence from the report:

--- start quote ---

We’ll ensure that a larger share of internal staff use the exact public build of Claude Code (as opposed to the version we use to test new features); and we'll make improvements to our Code Review tool that we use internally

--- end quote ---

Translation: no one is using or even testing the product we ship, and we blindly trust Claude Code to review and find bugs for us. Last one isn't even a translation: https://x.com/bcherny/status/2017742750473720121

link

krade 56 days ago

Off topic, but I'm hoping you'll maybe see this. There's been an issue with the VS code extension that makes it pretty much impossible to use (PreToolUse can't intercept permission requests anymore, using PermissionRequest hooks always open the diff viewer and steals focus):

https://github.com/anthropics/claude-code/issues/36286 https://github.com/anthropics/claude-code/issues/25018

link