Hacker News new | ask | show | jobs
by dave_universetf 323 days ago
BGP is the canonical way to take your entire global network offline within seconds. Glancing at BGP looking glasses, Starlink's prefixes seem to still be announced, but there could still be an accidental blackhole or routing loop within their AS, or something broken in one of their transit providers.

No idea if that's what's going on, but routing protocols are one of a few effectively global control planes that can go wrong very quickly like this.

6 comments

If it were BGP/routing, you would think we'd be able to still get a signal and the modem would think it's healthy (although maybe not if the issue prevented us from obtaining our public IP), we just wouldn't be able to route to any dst. In the current case we don't have a signal (orange light on the modem)
Yes, before the drop out my traffic was coming from a downlink station in Bulgaria, on an IP on AS14593

Traceroute from "the internet" back to that IP reaches AS14593 just fine, and my endpoint doesn't get beyond the first hop of the local starlink router.

Whatever it is, it doesn't look like a peering problem

https://mtr.ping.pe ftw for MTR from "the internet" ? :)
From various monitoring points I have on multiple internet connections.

One of the promises of starlink was it would stay in space as long as possible before being downlinked, giving far lower latency, alas that hasn't happened yet, and traffic will run thousands of miles in the wrong direction before being downlinked. For example from one location to another I have 360ms via Starlink but just 200ms rtt via local provision (5g p2p wireless then optical). On another it used to downlink in Lagos, but now it downlinks in Nairobi, meaning traffic to Lagos routes Nairobi -> Marseille -> Lagos, taking far longer than it used to. A shame really.

Does the orange light specifically mean no RF link at all? Or does it include anything that prevents the modem from getting an IP address and route configuration? If the latter, BGP could still be at fault if it took out access to the control planes on the ground. But again all just guessing, from the outside all I see is the BGP routes are still being announced, and everyone seems to be seeing 100% packet loss and zero traffic.
Right, good point that could be the case, those were just my assumptions and probably jumping to conclusions on my part speculating that orange means no signal (don't actually have any idea :) ). Imagine it could be any of what you said too.
That sounds bad…
A network of satellites gives you entirely new and exciting ways of taking your network offline, such as bricking them with a firmware update and no way to actually get up there and fix it.
Imagine being that dev who pushed the bad patch - Crowdstrike but x100
Crowdstrike’s fuckup was at the company level.

Whoever is doing immediate global deployments and/or any prod deployments without verified testing is just wrong as a corporate culture.

Elon saw yesterday's article about The Promised LAN, and said "What if we connected that instead of the Internet...?" https://news.ycombinator.com/item?id=44661682#44663409
Instead, it seems he's created the LAN of the lost
it's pretty amazing the amount of damage a BGP oopsie can do. Also, you can fit pretty much all the BGP admins for the entire Internet in one large room.
If by “room” you mean “Wembley stadium”, maybe.
Messing up with BGP makes you feel alive, that’s for sure.

But hey, if you haven’t caused an incident yet, that just means you’re still in onboarding. Those SLA downtime budgets are there to be spent.

I rebooted my terminal and I can’t tell for sure if it sees any satellites. It looks like it doesn't.

It says it didn’t, and it says the “which way is down” thing hasn’t converged. Occasionally, the signal to noise ratio light in the app goes gray which means < 3.

It also rebooted itself.

Before the first reboot, 30% of pings went through. It’s almost like the azimuth or some other timely but cached data was corrupted.

It's always a bad route that was introduced during a planned upgrade
I thought it was always DNS