Hacker News new | ask | show | jobs
by brookst 4 hours ago
Is there any indication these errors are related to Anthropic-written code as opposed to operational issues from the fastest-growing infra buildout ever?

Layer-wise, the app is pretty far removed from request routing to GPU pools.

2 comments

This is almost certainly a software issue, though. Even if it's due to scaling, they still built a system that failed catastrophically rather than degrading gracefully.
Sure. But could it be k8s config? Could it be Nvidia Bright Cluster? Could it be load balancing?

I'm not saying Anthropic isn't to blame for a system that is literally approaching one-nine uptime; they certainly are. I am saying that jumping to the "it must be vibe coding's fault" is an emotional confirmation-bias belief, not an evidence-based belief.

I'd expect that they're also managing their k8s config and other infra using LLMs (it's actually quite good at this, at least for my simple homelab use-cases).
> failed catastrophically rather than degrading gracefully

You mean like returning 529s and operating with reduced QoS?

Right. If this were truly a pure scaling issue, I’d expect the interface would offer an archive.is-esque “Claude is at capacity; your prompt is #XXX/YYY in the queue; estimated time remaining: ZZZ seconds”

Instead, the whole system just shits the bed, catastrophically.

But such messages would suggest that Claude has engineered limits, which isn't what the market wants to hear. Completely falling over and being unavailable is just another Tuesday on the internet, will be forgotten by the weekend.
> being unavailable is just another Tuesday on the internet, will be forgotten by the weekend.

This is true when you have like one failure a year, but Anthropic is starting to look a lot like github lately when it comes to uptime.

After a certain point the reputation for unreliability starts sticking to you, especially when you position yourself as an indispensable tool for completing work people need done.

Yeah to those of us who are on the know, but maybe it can be spun as "Claude adoption breaks the internet" to the consumer
Except these models are not run prompt-to-prompt. The infra has to hold the entire context.
I'm not sure if that's really an Anthropic problem you're pointing to vs a problem that their infra layer handles (Amazon, Google, whatever hyperscaler). i.e, they might be scaling quickly but they are running on top of established infrastructure.