|
|
|
|
|
by bcherny
56 days ago
|
|
Hey, Boris from the team here. We did both -- we did a number of UI iterations (eg. improving thinking loading states, making it more clear how many tokens are being downloaded, etc.). But we also reduced the default effort level after evals and dogfooding. The latter was not the right decision, so we rolled it back after finding that UX iterations were insufficient (people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this). |
|
For instance:
Is Haiku supposed to hit a warm system-prompt cache in a default Claude code setup?
I had `DISABLE_TELEMETRY=1` in my env and found the haiku requests would not hit a warm-cached system prompt. E.g. on first request just now w/ most recent version (v2.1.118, but happened on others):
w/ telemetry off - input_tokens:10 cache_read:0 cache_write:28897 out:249
w/ telemetry on - input_tokens:10 cache_read:24344 cache_write:7237 out:243
I used to think having so many users was leading to people hitting a lot of edge cases, 3 million users is 3 million different problems. Everyone can't be on the happy path. But then I started hitting weird edge cases and started thinking the permutations might not be under control.