Hacker News new | ask | show | jobs
by dbingham 1128 days ago
What is going on over there? Third day in a row is... kind of impressive.
8 comments

Lots of copilot generated code failing
Not only Copilot, seems like some Microsoft services like Bing AI and Bing Image Creator have some issues today as well with 4xx / 5xx, and incorrect region authorization (had to switch the account region from a European country to US to make it work again on mobile)
I think you may have missed the joke - they were implying Github was using Copilot internally, causing the outages, due to poor output. Not that Copilot itself was unavailable (although that may be true, also)
They blamed the march and april outages on some database query that was changed due to an infrastructure change they rolled out. I'm guessing their infrastructure change caused some other race condition issue that they are only seeing after major production failure due to not load testing enough in their staging environment https://github.blog/2023-05-03-github-availability-report-ap...
As much as I’ve been frustrated by these outrages, we’ve all been there
Now that is a good typo
could be on purpose
My money is on some significant backend architecture migration gone wrong without a viable way to roll back the time machine :)
Feels like an organization as big as Microsoft must surely have some sort contingency plan in place before doing such a large migration. Right?
Sales handing out Office365 discounts and trying to convince people that AWS and GCP is going to steal their data, judging by companies I worked for that used Azure.
Wasnt it true? Thas Amazon abused their AWS position and stole their competitors data, so thats why Germany's retail businesses are building their own Clouds
Any links? Interested on reading more about this
Here’s one

https://www.wsj.com/articles/amazon-scooped-up-data-from-its...

The gist is that yes it’s true. They’d come out with their own Amazon Basics branded stuff and push it to the top.

I worked on the volume licensing part of Microsoft years ago and deployments were stressful. They'd start at friday late like 8pm or so and go until 8am in the morning. Everyone was on a long call the entire time. I hated it.
At that point, I wondering if that's not me just because I updated some libraries on my local build agent.
Azure?
Just another day doing DevOps for a Ruby on Rails product.
Today seems worse than yesterday. I'm getting wildly inconsistent results when viewing repositories after a push. Hard to tell if my push actually went through, and it's not triggering actions.
exact same issue
It comes in around 09:30 on US east coast.

I suspect that it's related to high load.

Maybe they fired too many people this time? https://news.ycombinator.com/item?id=35334705
From an SRE, one of their DB clusters failed. They use Vitess which is great, but it can be prone to hotspots and doesn't auto-shard. Heavy usage (esp. from large customers, rogue jobs) can take down the cluster. When it goes down, it's a PITA to resolve.
This literally isn't true and looks awfully like the talking points of one of our competitors.
Ah, unbalanced shards via wrong sharding keys was an issue at one point, IIRC. I remember talking with an SRE there when something bad happened at GitHub last year, and I know that this time the current DB cluster failed.

To be clear, I _was_ mapping previous incidents with this year's incident — no competitor or hard feelings involved. I really like Vitess, fwiw. And the only thing I really love is FoundationDB :)

That wasn't clear.

Side note: "Autosharding" is largely a myth that unproven databases are touting. Sharding is complex and requires planning and control. Databases that start shuffling data round without oversight produce nasty surprises. Trying to be too magic is normally always a mistake with databases.

Yeah, fair, totally get it. Wasn't aiming to spread FUD, and I know that FDB is a little hard to compare against... it is pretty magic with how it routes and shards :D (https://forums.foundationdb.org/t/keyspace-partitions-perfor...)
For posterity: https://github.blog/2023-05-16-addressing-githubs-recent-ava...

It was the DB, and it was rogue usage on May 10, so I'm standing by my original comment

What would you know, random Hacker News commen--oh. Hi Sam, carry on.
<3 Hey Sarah!