| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dbingham 1175 days ago
	What is going on over there? Third day in a row is... kind of impressive.

8 comments

aranw 1175 days ago

Lots of copilot generated code failing

link

practice9 1175 days ago

Not only Copilot, seems like some Microsoft services like Bing AI and Bing Image Creator have some issues today as well with 4xx / 5xx, and incorrect region authorization (had to switch the account region from a European country to US to make it work again on mobile)

link

tesin 1174 days ago

I think you may have missed the joke - they were implying Github was using Copilot internally, causing the outages, due to poor output. Not that Copilot itself was unavailable (although that may be true, also)

link

iepathos 1175 days ago

They blamed the march and april outages on some database query that was changed due to an infrastructure change they rolled out. I'm guessing their infrastructure change caused some other race condition issue that they are only seeing after major production failure due to not load testing enough in their staging environment https://github.blog/2023-05-03-github-availability-report-ap...

link

edgyquant 1175 days ago

As much as I’ve been frustrated by these outrages, we’ve all been there

link

qmacro 1175 days ago

Now that is a good typo

link

robofanatic 1175 days ago

could be on purpose

link

frde 1175 days ago

My money is on some significant backend architecture migration gone wrong without a viable way to roll back the time machine :)

link

capableweb 1175 days ago

Feels like an organization as big as Microsoft must surely have some sort contingency plan in place before doing such a large migration. Right?

link

isbvhodnvemrwvn 1175 days ago

Sales handing out Office365 discounts and trying to convince people that AWS and GCP is going to steal their data, judging by companies I worked for that used Azure.

link

hardware2win 1175 days ago

Wasnt it true? Thas Amazon abused their AWS position and stole their competitors data, so thats why Germany's retail businesses are building their own Clouds

link

jacooper 1174 days ago

Any links? Interested on reading more about this

link

belmont_sup 1174 days ago

Here’s one

https://www.wsj.com/articles/amazon-scooped-up-data-from-its...

The gist is that yes it’s true. They’d come out with their own Amazon Basics branded stuff and push it to the top.

link

xeromal 1175 days ago

I worked on the volume licensing part of Microsoft years ago and deployments were stressful. They'd start at friday late like 8pm or so and go until 8am in the morning. Everyone was on a long call the entire time. I hated it.

link

whynotmaybe 1175 days ago

At that point, I wondering if that's not me just because I updated some libraries on my local build agent.

link

lallysingh 1175 days ago

Azure?

link

zamalek 1175 days ago

Just another day doing DevOps for a Ruby on Rails product.

link

candiddevmike 1175 days ago

Today seems worse than yesterday. I'm getting wildly inconsistent results when viewing repositories after a push. Hard to tell if my push actually went through, and it's not triggering actions.

link

robofanatic 1175 days ago

exact same issue

link

SideburnsOfDoom 1175 days ago

It comes in around 09:30 on US east coast.

I suspect that it's related to high load.

link

pera 1175 days ago

Maybe they fired too many people this time? https://news.ycombinator.com/item?id=35334705

link

tonyhb 1175 days ago

From an SRE, one of their DB clusters failed. They use Vitess which is great, but it can be prone to hotspots and doesn't auto-shard. Heavy usage (esp. from large customers, rogue jobs) can take down the cluster. When it goes down, it's a PITA to resolve.

link

samlambert 1175 days ago

This literally isn't true and looks awfully like the talking points of one of our competitors.

link

tonyhb 1175 days ago

Ah, unbalanced shards via wrong sharding keys was an issue at one point, IIRC. I remember talking with an SRE there when something bad happened at GitHub last year, and I know that this time the current DB cluster failed.

To be clear, I _was_ mapping previous incidents with this year's incident — no competitor or hard feelings involved. I really like Vitess, fwiw. And the only thing I really love is FoundationDB :)

link

samlambert 1175 days ago

That wasn't clear.

Side note: "Autosharding" is largely a myth that unproven databases are touting. Sharding is complex and requires planning and control. Databases that start shuffling data round without oversight produce nasty surprises. Trying to be too magic is normally always a mistake with databases.

link

tonyhb 1175 days ago

Yeah, fair, totally get it. Wasn't aiming to spread FUD, and I know that FDB is a little hard to compare against... it is pretty magic with how it routes and shards :D (https://forums.foundationdb.org/t/keyspace-partitions-perfor...)

link

tonyhb 1169 days ago

For posterity: https://github.blog/2023-05-16-addressing-githubs-recent-ava...

It was the DB, and it was rogue usage on May 10, so I'm standing by my original comment

link

cheshire137 1175 days ago

What would you know, random Hacker News commen--oh. Hi Sam, carry on.

link

samlambert 1175 days ago

<3 Hey Sarah!

link