No amount of valuation can fix global supply issues for GPUs for inference unfortunately.
I suspect they're highly oversubscribed, thus the reason why we're seeing them do other things to cut down on inference cost (ie changing their default thinking length).
Wouldn't that be good? I remember back in the day you could only get Gmail thru an invite, it was an awesome strategy. "Currently closed for applications" creates FOMO. They'd just need to actually get the GPUs in relatively short supply. They could do it in bursts though, right? "Now accepting applications for a short time."
I'm not an internet marketer but that sounds like a win win to me. People feel special, they get extra hype, and the service isn't broken.
Are you sure it was fake scarcity for Gmail? IIRC they did it because they were worried about systems falling over if it grew too fast, and discovered the marketing benefits as a side effect.
maybe, but the response to GPU shortages being increased error rates is the concern imo. they could implement queuing or delayed response times. it's been long enough that they've had plenty of time to implement things like this, at least on their web-ui where they have full control. instead it still just errors with no further information.
i notice that as well. most of the time when i see those it has a retry counter also and i can see it trying and failing multiple requests haha. almost never succeeds in producing a response when i see those though, eventually just errors out completely.
That implies that either the auth is too heavy (possible, ish) or their systems don't degrade gracefully enough and many different types of failures propagate up and out all the way to their outermost layer, ie. auth (more plausible).
Disclosure: I have scars from a distributed system where errors propagated outwards and took down auth...
> thus the reason why we're seeing them do other things to cut down on inference cost (ie changing their default thinking length).
The dynamic thinking and response length is funny enough the best upgrade I've experienced with the service for more than a year. I really appreciate that when I say or ask something simple the answer now just comes back as a single sentence without having to manually toggle "concise" mode on and off again.
I'm pretty sure ai-x writes sarcasm and skips the /s for pure fun. Personally, I'm amused and I like what he's doing. Others have done it before him though, it's not a new trick.
I literally just came to HN to ask if I was alone with the acurséd "API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"},"request_id":"…"}" greeting me and telling me to get back to using my brain!
500-series errors are server-side, 400 series are client side.
A 500 error is almost never "just you".
( 404 is a client error, because it's the client requesting a file that does not exist, a problem with the client, not the server, who is _obviously_ blameless in the file not existing. )
I know you added the defensive "almost" but if I had a dollar each time I saw a 500 due to the session cookies being sent by the client that made the backend explode - for whatever root cause - well, I would have a fatter wallet.
Indeed, and also there's a special circle of hell reserved for anyone who dares change the interface on a public API, and forgets about client caching leading to invalid requests but only for one or two confused users in particular.
Bonus points if due to the way that invalid requests are rejected, they are filtered out as invalid traffic and don't even show up as a spike in the application error logs.
I know that in principle this is true. However, I have seen claude shadow-throttle my ipv4 address (I am behind CGNAT), in line with their "VPN" policy -- so I do not trust it, frankly.
This is how I learn that they have a "VPN" policy. Thinking of it maybe it makes sense, that is if it's what I think it is, but seems scummy nonetheless.
Yep, daily haha. Well at least this time they aren't just silently reducing thinking on the server side, which ended up making a mess in my codebase when they did that last time. I'd rather a 500 than a silent rug-pull.