There's probably a curve of diminishing returns when it comes to how much effort you throw in to improve uptime, which also directly affects the degree of overengineering around it.
I'm not saying that it should excuse straight up bad engineering practices, but I'd rather have them iterate on the core product (and maybe even make their Electron app more usable: not to have switching conversations take 2-4 seconds sometimes when those should be stored locally and also to have bare minimum such as some sort of an indicator when something is happening, instead of "Let me write a plan" and then there is nothing else indicating progress vs a silently dropped connection) than pursue near-perfect uptime.
Sorry about the usability rant, but my point is that I'd expect medical systems and planes to have amazing uptime, whereas most other things that have lower stakes I wouldn't be so demanding of. The context I've not mentioned so far is that I've seen whole systems get developed poorly, because they overengineered the architecture and crippled their ability to iterate, sometimes thinking they'd need scale when a simpler architecture, but a better developed one would have sufficed!
Ofc there's a difference between sometimes having to wait in a queue for a request to be serviced or having a few requests get dropped here and there and needing to retry them vs your system just having a cascading failure that it can't automatically recover from and that brings it down for hours. Having not enough cards feels like it should result in the former, not the latter.
Ehh, I'd say there's not much difference between the UI being in a state that's for all intents being frozen due to not having a status indicator or actually having a dropped request and the UI doing nothing because there's nothing going on, you know?
Or having the Electron UI being sluggish 99% of the time during daily use vs dealing with that 1% of the time when there's outages. I'd rather have that 99% be good and 1% not work, than 99.9% be miserable and 0.01% not work.
Yep, it's not like electricity which is an essential service.
If electricity faults for one second, chaos breaks loose as so many things depend on it. Imagine having several microblackouts a day? We wouldn't tolerate. It's so reliable that normal people don't need redundancy for it, they just tap into the stream which is always available.
AI is definitely shaping up to NOT become like that. We're designing for an unreliable system (try again buttons, etc), and the use cases follow that design.
I'm not saying that it should excuse straight up bad engineering practices, but I'd rather have them iterate on the core product (and maybe even make their Electron app more usable: not to have switching conversations take 2-4 seconds sometimes when those should be stored locally and also to have bare minimum such as some sort of an indicator when something is happening, instead of "Let me write a plan" and then there is nothing else indicating progress vs a silently dropped connection) than pursue near-perfect uptime.
Sorry about the usability rant, but my point is that I'd expect medical systems and planes to have amazing uptime, whereas most other things that have lower stakes I wouldn't be so demanding of. The context I've not mentioned so far is that I've seen whole systems get developed poorly, because they overengineered the architecture and crippled their ability to iterate, sometimes thinking they'd need scale when a simpler architecture, but a better developed one would have sufficed!
Ofc there's a difference between sometimes having to wait in a queue for a request to be serviced or having a few requests get dropped here and there and needing to retry them vs your system just having a cascading failure that it can't automatically recover from and that brings it down for hours. Having not enough cards feels like it should result in the former, not the latter.