Hacker News new | ask | show | jobs
by EduardoBautista 15 days ago
May has been filled with critical issues. It seems it's getting worse over time.
1 comments

Commits are up 14x year-over-year

https://x.com/kdaigle/status/2040164759836778878

Yea but thats not really an excuse, is it? They offer a service, (some) people pay for that service and should therefore expect it to work. If GitHub cannot keep up with the growth then they could disable new account registrations or start reducing free tiers so people either use the free tier more mindfully or need to pay for usage-base products like Actions which would GitHub allow to scale.
I mean it's an easy problem to solve when it's just speculating solutions. But there's a very possible reality where in 5 years guys are making YouTube video essays about the fall of Github caused by their "obviously stupid decision" to throttle access to people who were trying to use their service in record numbers, leaving opportunity for someone else to come in and take their lunch.

I don't envy their position of having to scale that fast on something that has to be instant and real-time. As far as I know, you can't do CDN/edge caching shenanigans with a remote git repository like Google can with a YouTube video. It's gotta always be reading/writing to the latest, single source of truth.

Sure, backseat commenting is easier and I wouldn't wanna be in charge at github right now, but on the other side there also a reality where we'd see video essays about githubs downfall because their reliability crashed so hard that businesses could not trust them and moved to competitors / self hosted instances which then meant less paid users to subsidize the ever growing demand of the free users.
It’s not quite as cacheable as YouTube, but a lot of it is still pretty cacheable. Actions aren’t. Issues, wikis, and READMEs are. File views are. Most projects aren’t changing daily. The few that change daily aren’t changing hourly. There are a few that change constantly and would require constant cache updates. But the long tail is pretty static.
Yes it's potentially a write-heavy workload which also needs to be consistent aka the worst case scenario.

The easy solutions like caching and read replicas don't work and you're forced to go the route of sharding or similar techniques that have much more painful tradeoffs.

I'm not sure if that's why everything keeps breaking but at that scale write-heavy workloads are never going to be easy

They are highly responsible for all of that. They are diversified a lot with a lot of random things instead of focusing on their core business. They have actively pushed people to use the service and feature more.

Think about countless actions that have to run almost at every push and PR push! Also, remember that we were used to use external services for "actions", and they basically killed the competition by offering their own CI actions at no cost to most users.

Also, they did a lot of reworks in the last years, not necessarily for the best like the PR diff page, and probably not in the most efficient way.

Not a valid excuse without knowing what their historical growth rate has been. And how much of the instability is load related.
GitHub has been publishing their growth numbers since at least 2016: https://octoverse.github.com/2016/

However, they have reported numbers along rather inconsistent dimensions. Like, historically they've focused on number of repos and users and later PR's and issues, and often catch-all terms like "contributions" which includes all of those + comments etc... but the number of commits alone (which apparently is the main culprit now?) has been mentioned very sporadically. This has made it hard to get a consistent sense of historical growth.

Without any other information, however, it is reasonable to assume that a 14x in commits is the prime candidate for instability. Especially since commits are write traffic, which is much harder to scale than read traffic. Plus every 3 - 5x increase in scale can reveal bottlenecks in your distributed systems that you never knew existed, so they probably have like 2 - 3 "generations" of bottlenecks to figure out!