Hacker News new | ask | show | jobs
by cmenge 49 days ago
Bit surprised about the amount of flak they're getting here. I found the article seemed clear, honest and definitely plausible.

The deterioration was real and annoying, and shines a light on the problematic lack of transparency of what exactly is going on behind the scenes and the somewhat arbitrary token-cost based billing - too many factors at play, if you wanted to trace that as a user you can just do the work yourself instead.

The fact that waiting for a long time before resuming a convo incurs additional cost and lag seemed clear to me from having worked with LLM APIs directly, but it might be important to make this more obvious in the TUI.

4 comments

I agree that it’s plausible, and I hope they learn. But trust is earned, and Anthropic’s public responses this past month were dismissive and unhelpful.

Every one of these changes had the same goal: trading the intelligence users rely on for cheaper or faster outputs. Users adapt to how a model behaves, so sudden shifts without transparency are disorienting.

The timing also undercuts their narrative. The fixes landed right before another change with the same underlying intent rolled out. That looks more like they were just reacting to experiments rather than understanding the underlying user pain.

When people pay hundreds or thousands a month, they expect reliability and clear communication, ideally opt-in. Competitors are right there, and unreliability pushes users straight to them.

All of this points to their priorities not being aligned with their users’.

> All of this points to their priorities not being aligned with their users’.

Framing this as "aligned" or "not aligned" ignores the interesting reality in the middle. It is banal to say an organization isn't perfectly aligned with its customers.

I'm not disagreeing with the commenter's frustration. But I think it can help to try something out: take say the top three companies whose product you interact with on a regular basis. Take stock of (1) how fast that technology is moving; (2) how often things break from your POV; (3) how soon the company acknowledges it; (4) how long it takes for a fix. Then ask "if a friend of yours (competent and hard working) was working there, would I give the company more credit?"

My overall feel is that people underestimate the complexity of the systems at Anthropic and the chaos of the growth.

These kind of conversations are a sort of window into people's expectations and their ability to envision the possible explanations of what is happening at Anthropic.

>My overall feel is that people underestimate the complexity of the systems at Anthropic and the chaos of the growth.

Making changes like reducing the usage window at peak times (https://x.com/trq212/status/2037254607001559305) without announcing it (until after the backlash) is the sort of thing that's making people lose trust in Anthropic. They completely ignored support tickets and GitHub issues about that for 3 days.

You shouldn't have to rely on finding an individual employee's posts on Reddit or X for policy announcements.

That policy hasn't even been put into their official documentation nearly one month on - https://support.claude.com/en/articles/11647753-how-do-usage...

A company with their resources could easily do better.

> You shouldn't have to rely on finding an individual employee's posts on Reddit or X for policy announcements.

I agree with this as a principle. Which raises this question: is it true? Are you certain these messages don't show up in (a) Claude Code and (b) Claude on the Web?

I've seen these kinds of messages pop up. I haven't taken inventory of how often they do. As a guess, maybe I see notifications like this several times a month. If any important ones are missing, that is a mistake.

Anyhow, this is the kind of discussion that I want people to have. I appreciate the detail.

> A company with their resources could easily do better.

Yes, they could. But easily? I'm not so sure.

Also ask yourself: what function does saying e.g. "they could have done better" serve? What does it help accomplish? I'm asking. I think it often serves as a sort of self-reinforcing thing to say that doesn't really invite more thinking.

Ask yourself: If "doing better" was easy, why didn't it happen? Maybe it isn't quite as easy as you think? Maybe you've baked in a lot of assumptions. Easy for who? Easy why? Try the questions I asked, above. They are not rhetorical. Here they are again, rephrased a bit

    > take the top three companies whose product you 
    > interact with on a regular basis. Take stock of
    > (1) how fast the technology is moving;
    > (2) how often things break from your POV;
    > (3) how soon the company acknowledges it;
    > (4) how long it takes for a fix.
    >
    > Then ask "if a friend of mine (competent, hard working)
    > worked there, how would I be thinking about the situation?"
There is a reason why I recommend asking these questions. Forcing yourself to write down your reference class is ... to me, table stakes, but well, lots of people just leave it floating and then ask other people to magically reconstruct it. Envisioning a friend working there shifts your viewpoint and can shake lose many common biases.
Thanks for the example -- you are one of the first people to quote a source, so I appreciate it. This makes constructive discussion much easier. You quoted this:

    > To manage growing demand for Claude we're adjusting our
    > 5 hour session limits for free/Pro/Max subs during peak
    > hours. Your weekly limits remain unchanged.
    >
    > During weekdays between 5am–11am PT / 1pm–7pm GMT, you'll
    > move through your 5-hour session limits faster than before.
And yeah, no disagreement from me: many users are not going to like this. Narrowly speaking, I don't want any chance that reduces what I get for what I pay for. I also care about overall reliability, so if some users on the right tail of the usage distribution find themselves losing out, my take is "Yeah, they are disappointed, but this is rational decision for any company with this kind of subscription model."

Broken expectations are highly dependent on perception. People get used to having some particular level. When that changes and they notice, and being humans a strong default is to reach for something to blame. Then we rationalize. That last two parts are unhelpful, and I push back on them frequently.

So you're arguing they're just plain incompetent? Not sure that's going to win the trust of customers either.
> So you're arguing they're just plain incompetent? Not sure that's going to win the trust of customers either.

This is not a charitable interpretation of what I wrote. Please take a minute and rethink and rephrase. Here are two important guidelines, hopefully familiar to someone who has had an account since 2019:

> Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

I didn't assume bad faith, I simply reworded your conclusions with less soft language so that others would understand your position more clearly.

You are saying what they are doing is hard. That's fine. Their stated goals are to be the responsible stewards of the technology and we agree they are failing at that goal. You would attribute that to incompetence and not malice.

I personally try to follow Rapoport's Rules, and I since think they are consistent with the HN Guidelines, I like to mention them: [1].

I've thought on it, and I will try to start off with something we both agree on... We both agree that Anthropic made some mistakes, but this is probably a pretty uninteresting and shallow agreement. I find it unlikely that we would enumerate or characterize the mistakes similarly. I find it unlikely that we would be anywhere near the same headspace about our bigger-picture takes.

> I didn't assume bad faith

Ok, I'm glad. That one didn't concern me; if I had a do-over I would remove that one from the list. Sorry about that. These are the ones that concern me:

    > Comments should get more thoughtful and substantive,
    > not less, as a topic gets more divisive.
When I read your earlier comment (~20 words), it didn't come across as a thoughtful and substantive response to my comment (~160 words). I know length isn't a perfect measure nor the only measure, but it does matter.

    > Please respond to the strongest plausible interpretation of what
    > someone says, not a weaker one that's easier to criticize.
Are you sure you didn't choose an easier to criticize interpretation? Did you take the take to try to state to yourself what I was trying to say? Back to Rapaport's Rules ...

    > You should attempt to re-express your target’s position so
    > clearly, vividly, and fairly that your target says, “Thanks,
    > I wish I’d thought of putting it that way.”
I'm grateful when people can express what I'm going for better than the way I wrote it or said it.

> I simply reworded your conclusions with less soft language

Technically speaking, lots of things could be called "rewording", but what you did was relatively far from "simply rewording". Charitably, it is closer to "your interpretation". But my intent was lost, so "rewording" doesn't fit.

> ... so that others would understand your position more clearly.

If you want to help others understand, then it is good to make sure you understand. For that, I recommend asking questions.

> Their stated goals are to be the responsible stewards of the technology and we agree they are failing at that goal.

No, I do not agree to that phrasing. It is likely I don't agree with your intention behind it either.

> You would attribute that to incompetence and not malice.

No; even if I agreed with the premise, I think it is more likely I would still disagree. I don't even like the framing of "either malice or incompetence". These ideas don't carve reality at the joints. [2] [3] There are a lot of stereotypes about "incompetence" but I don't think they really help us understand the world. These stereotypes are more like thought-terminators than interesting generative lenses.

I'll try to bring it back to the words "malice" and "incompetence" even though I think the latter is nigh-useless as a sense-making tool. Many mistakes happen without malice or incompetence; many mistakes "just happen" because people and organizations are not designed to be perfect. They are designed to be good enough. To not make any short-term mistakes would likely require too much energy or too much rigidity, both of which would be a worse category of mistake.

Try to think counterfactually: imagine a world where Anthropic is not malicious nor incompetent and yet mistakes still happened. What would this look like?

When you think of what Anthropic did wrong, what do you see as the lead up to it? Can you really envision the chain of events that brought it about? Imagine reading the email chain or the PRs. Can you see how there may be been various "off-ramps" where history might have gone differently? But for each of those diversions, how likely would it be that they match the universe we're in?

At some point figuring out what is a "mistake" even starts to feel strange. Does it require consciousness? Most people think so. But we say organizations make mistakes, but they aren't conscious -- or are they? Who do we blame? The CEO, because the buck stops there, right? He "should have known better". But why? Wait, but the Board is responsible...?

Is there any ethical foundation here? Some standard at all or is this all just anger dressed up as an argument? If this assigning blame thing starts to feel horribly complicated or even pointless, then maybe I've made my point. :)

If nothing else, when you read what I write, I want it to make you stop, get out a sheet of paper, and try to imagine something vividly. Your imagination I think will persuade you better than I can.

[1]: https://themindcollection.com/rapoports-rules/

[2]: https://jollycontrarian.com/index.php?title=Carving_nature_a...

[3]:https://english.stackexchange.com/questions/303819/what-do-t...

Do you not think people here work at big companies with big products? I do, and we have a much higher bar for shipping.
>> My overall feel is that people underestimate the complexity of the systems at Anthropic and the chaos of the growth.

> Do you not think people here work at big companies with big products? I do, and we have a much higher bar for shipping.

This form of comment (The "Do you not think {X}?") comes across as a swipe (discouraged by the HN guidelines). It doesn't respond to the strongest plausible interpretation of my comment (also in the guidelines).

That's fair. I'll adjust and say that I think there's a mix: some people certainly are bashing without understanding, but there are also a lot of engineers here whose day to day work is held to a higher standard than I think we see coming out of Anthropic, at least w.r.t. the product side of things (obviously the models are great).
Thanks. Along those lines, here's a sort of thought experiment. Of said engineers who know a higher standard, say we teleported them into Anthropic, what are some likely scenarios?

- How much time would they need to import their standards into Anthropic? ... things like tooling, process, culture, hiring, etc? Maybe externally-sourced discipline and rigor are the missing catalysts. [1]

- OTOH, it seems possible these engineers (many of which are used to certain levels of stability, sanity, internal tooling, etc) would be destabilized by Anthropic's problems, the scale, the rate of hiring, the rate of customer growth.

- Perhaps Anthropic needs new instrumentation to cover end-to-end customer metrics? More internal tool-building teams? A new ops team? A new org structure? I don't know.

The growth, the environment has put Anthropic into a position where these kinds of mistakes are just statistically inevitable ... unless they chose to grow more slowly.

So my overall hunch (very few people really grok the constellation of factors at Anthropic) is fuzzy. That's why I'm trying to lay out some of the questions that underlie it, without resorting to simplistic notions of blame (which paper over the deeper causes).

Lastly, can you think of comparable scenarios with this kind of growth where companies don't have major hiccups? This is driving towards thinking about the outside view [2]. Roughly speaking: don't expect to "beat the market" for long. Entropy wins.

[1]: I recently watched a video where Steve Jobs described a time in early Macintosh history where Apple tried to "professionalize" its management. Hiring proven managers didn't work, so they shifted towards hiring for cultural fit and letting them grow the management skills.

[2]: https://www.lesswrong.com/w/inside-outside-view

Some of the flak is that issues are often only acknowledged once a fix is in place, and the partial fixes are presented as if they solve the whole problem.

The near-instant transition from "there is no problem" to "we already fixed the problem so stop complaining" is basically gaslighting. (Admittedly the second sentiment comes more from the community, but they get that attitude after taking the "we fixed all the problems" posts at face value.)

And they are often dismissed at first as perception/subjective bias, getting used to models being good and having higher expectations due to that, etc. users are blamed a lot before they are forced to admit that there is an actual problem.
They gaslit people for months saying it wasn't an issue publicly.

That's the reason for the flak

And still are gaslighting:

  We take reports about degradation very seriously. We never intentionally degrade our models [...] On March 4, we changed Claude Code's default reasoning effort from high to medium
Anthropic is the best company of its kind, but that is badly worded PR.
Is adding JPEG compression to your software “intentional degradation” of the software? I wouldn't say providing a selectable option to use a faster, cheaper version of something qualifies as “degradation”.

It is certainly true that they did a poor job communicating this change to users (I did not know that the default was “high” before they introduced it, I assumed they had added an effort level both above and below whatever the only effort choice was there before). On the other hand, I was using Claude Code a fair bit on “medium” during that time period and it seemed to be performing just fine for me (and saving usage/time over “high”), so it doesn't seem clear that that was the wrong default, if only it had been explained better.

Is default enabling JPEG compression to your software's output because the compression saves you money “intentional degradation” of the software?

I would say it does, and I'd loathe to use anything made by people who'd couch that change to defaults as "providing a selectable option to use a faster, cheaper version".

Yuck.

yes. if instagram started performing intensive JPEG compression that made photos choppy and unpleasant, I would consider that an intentional degredation of the software.
As I understand Anthropic's recent retrospective, calling the models directly via API did not change; the problem was that the harness changed and this was not communicated well to users.

Metaphorical reasoning is lossy, so talking about lossy image compression seems to be ironically fitting! ... perhaps a (hypothetical) metaphor involves Photoshop changing their default JPEG compression level without making it clear to users. PS did not change the JPEG algorithm, only a setting for it. If you look closely, you would notice it: I'll come back to this point in the last paragraph.

But a part of metaphor breaks down if you accept that Anthropic was making a net positive trade-off for customers so that they could provide a better overall service level statistically to their entire user base.

A rough metaphor for the individual versus collective trade-off might be when a retail store caps the number of toilet paper rolls customer can buy at a time. The goal is to reduce hoarding, which in a way is an analogous to Claude users having usage patterns at the high end of the statistical tail.

When it comes to PR*, transparency almost always wins? Anthropic's mistake hid the change from users, but they're going to notice when overall performance is degraded. I would hazard a guess that Claude has endured more verbal assault in the last month than in its entire history.

* both for public relations and pull requests

To my eye, gaslighting is a serious accusation. Wikipedia's first line matches how I think of it: "Gaslighting is the manipulation of someone into questioning their perception of reality."

Did I miss something? I'm only looking at primary sources to start. Not Reddit. Not The Register. Official company communications.

Did Anthropic tell users i.e. "you are wrong, your experience is not worse."? If so, that would reach the bar of gaslighting, as I understand it (and I'm not alone). If you have a different understanding, please share what it is so I understand what you mean.

I'd rather not speak too poorly of Anthropic, because - to the extent I can bring myself to like a tech company - I like Anthropic.

That said, the copy uses "we never intentionally degrade our models" to mean something like "we never degrade one facet of our models unless it improves some other facet of our models". This is a cop out, because it is what users suspected and complained about. What users want - regardless of whether it is realistic to expect - is for Anthropic to buy even more compute than Anthropic already does, so that the models remain equally smart even if the service demand increases.

It seems to me you dropped the "gaslighting" claim without owning it. I personally find this frustrating. I prefer when people own up to their mistakes. Like many people, to me, "gaslighting" is just not a term you throw around lightly. Then you shifted to "cop out". (This feels like the motte and bailey.) But I don't think "cop out" is a phrase that works either...

Some terms:... The model is the thing that runs inference. Claude Code is not a model, it is harness. To summarize Anthropic's recent retrospective, their technical mistakes were about the harness.

I'm not here to 'defend' Anthropic's mistakes. They messed up technically. And their communication could have been better. But they didn't gaslight. And on balance, I don't see net evidence that they've "copped out" (by which I mean mischaracterized what happened). I see more evidence of the opposite. I could be wrong about any of this, but I'm here to talk about it in the clearest, best way I can. If anyone wants to point to primary sources, I'll read them.

I want more people to actually spend a few minutes and actually give the explanation offered by Anthropic a try. What if isolating the problems was hard to figure out? We all know hindsight is 20/20 and yet people still armchair quarterback.

At the risk of sounding preachy, I'm here to say "people, we need to do better". Hacker News is a special place, but we lose it a little bit every time we don't in a quality effort.

Fair enough. If the comments in question were still editable, I would be happy to replace 'gaslighting' with 'being a bit slippery' or something less controversial.

No worries about 'sounding preachy'; it's a good thing people want to uphold the sobriety that makes HN special.

I think there are plenty of such reply on github. For example the one to AMD AI director's issue.
Please link us to it. Linking it provides an anchor for community discussion.
They didn’t say “your experience is not worse” but they did frequently say “just turn reasoning effort back up and it will be fine”. And that pretty explicitly invalidates all the (correct) feedback which said it’s not just reasoning effort.

They knew they had deliberately made their system worse, despite their lame promise published today that they would never do such a thing. And so they incorrectly assumed that their ham fisted policy blunder was the only problem.

Still plenty I prefer about Claude over GPT but this really stings.

I'm aiming for intellectual honesty here. I'm not taking a side for a person or an org, but I'm taking a stand for a quality bar.

> They knew they had deliberately made their system worse

Define "they". The teams that made particular changes? In real-world organizations, not all relevant information flows to all the right places at the right time. Mistakes happen because these are complex systems.

Define "worse". There are lot of factors involved. With a given amount of capacity at a given time, some aspect of "quality" has to give. So "quality" is a judgment call. It is easy to use a non-charitable definition to "gotcha" someone. (Some concepts are inherently indefensible. Sometimes you just can't win. "Quality" is one of those things. As soon as I define quality one way, you can attack me by defining it another way. A particular version of this principle is explained in The Alignment Problem by Brian Christian, by the way, regarding predictive policing iirc.)

I'm seeing a lot of moral outrage but not enough intellectual curiosity. It embarrassingly easy to say "they should have done better" ... ok. Until someone demonstrates to me they understand the complexity of a nearly-billion dollar company rapidly scaling with new technology, growing faster than most people comprehend, I think ... they are just complaining and cooking up reasons so they are right in feeling that way. This possible truth: complex systems are hard to do well apparently doesn't scratch that itch for many people. So they reach for blame. This is not the way to learn. Blaming tends to cut off curiosity.

I suggest this instead: redirect if you can to "what makes these things so complicated?" and go learn about that. You'll be happier, smarter, and ... most importantly ... be building a habit that will serve you well in life. Take it from an old guy who is late to the game on this. I've bailed on companies because "I thought I knew better". :/

> Define "they". The teams that made particular changes? In real-world organizations, not all relevant information flows to all the right places at the right time. Mistakes happen because these are complex systems.

Accidentally/deliberately making your CS teams ill-informed should not function as a get out of jail free card. Rather the reverse.

I know some people use the word "gaslighting" in connection with Anthropic. I've read some of those threads here, and some on Reddit, but I don't put much stock in them. To step back, hopefully reasonable people can start here:

    1. Degraded service sucks.
    2. Anthropic not saying i.e. "we're not seeing it" sucks.
    3. Not getting a fix when you want it sucks.
Try to understand what I mean when I say none of the above meet the following sense of gaslighting: "Gaslighting is the manipulation of someone into questioning their perception of reality." Emphasis on understand what I mean. This says it well: [1].

If you can point me to an official communication from Anthropic where they say "User <so and so> is not actually seeing degraded performance" when Anthropic knows otherwise that would clearly be gaslighting -- intent matters by my book.

But if their instrumentation was bad and they were genuinely reporting what they could see, that doesn't cross into gaslighting by my book. But I have a tendency to think carefully about ethical definitions. Some people just grab a word off the shelf with a negative valence and run with it: I don't put much stock in what those people say. Words are cheap. Good ethical reasoning is hard and valuable.

It's fine if you have a different definition of "gaslighting". Just remember that some of us have been actually gaslight by people, so we prefer to save the word for situations where the original definition applies. People like us are not opposed to being disappointed, upset, or angry at Anthropic, but we have certain epistemic standards that we don't toss out when an important tool fails to meet our expectations and the company behind it doesn't recognize it soon enough.

[1]: https://www.reddit.com/r/TwoXChromosomes/comments/tep32v/can...

The explanations are all fine.

But they come after the team gaslit everyone, telling us it was a skill issue.