| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brookst 4 hours ago
	Is there any indication these errors are related to Anthropic-written code as opposed to operational issues from the fastest-growing infra buildout ever? Layer-wise, the app is pretty far removed from request routing to GPU pools.

2 comments

organsnyder 4 hours ago

This is almost certainly a software issue, though. Even if it's due to scaling, they still built a system that failed catastrophically rather than degrading gracefully.

link

brookst 3 hours ago

Sure. But could it be k8s config? Could it be Nvidia Bright Cluster? Could it be load balancing?

I'm not saying Anthropic isn't to blame for a system that is literally approaching one-nine uptime; they certainly are. I am saying that jumping to the "it must be vibe coding's fault" is an emotional confirmation-bias belief, not an evidence-based belief.

link

organsnyder 3 hours ago

I'd expect that they're also managing their k8s config and other infra using LLMs (it's actually quite good at this, at least for my simple homelab use-cases).

link

dpark 1 hour ago

> failed catastrophically rather than degrading gracefully

You mean like returning 529s and operating with reduced QoS?

link

MontyCarloHall 4 hours ago

Right. If this were truly a pure scaling issue, I’d expect the interface would offer an archive.is-esque “Claude is at capacity; your prompt is #XXX/YYY in the queue; estimated time remaining: ZZZ seconds”

Instead, the whole system just shits the bed, catastrophically.

link

SoftTalker 3 hours ago

But such messages would suggest that Claude has engineered limits, which isn't what the market wants to hear. Completely falling over and being unavailable is just another Tuesday on the internet, will be forgotten by the weekend.

link

georgemcbay 3 hours ago

> being unavailable is just another Tuesday on the internet, will be forgotten by the weekend.

This is true when you have like one failure a year, but Anthropic is starting to look a lot like github lately when it comes to uptime.

After a certain point the reputation for unreliability starts sticking to you, especially when you position yourself as an indispensable tool for completing work people need done.

link

matltc 2 hours ago

Yeah to those of us who are on the know, but maybe it can be spun as "Claude adoption breaks the internet" to the consumer

link

slashdave 1 hour ago

Except these models are not run prompt-to-prompt. The infra has to hold the entire context.

link

Insanity 3 hours ago

I'm not sure if that's really an Anthropic problem you're pointing to vs a problem that their infra layer handles (Amazon, Google, whatever hyperscaler). i.e, they might be scaling quickly but they are running on top of established infrastructure.

link