Hacker News new | ask | show | jobs
by eddsh1994 1329 days ago
Why do you assume the outages are language related and not due to the complex product having bugs? How does Rust prevent bad schema changes or missing data in the DB?
1 comments

Because I worked with Rails for 6.5 years. Outages beyond smaller scales were at best a weekly occurrence.

Obviously I can't know for sure but it's not an uninformed assumption.

You know you can just click on the post title, that will open the posted link in which you can read the detailed cause of all the outages they had that month.

If you do this, you will realize that none are close to what you describe.

Also have you considered that if you had weekly outage when billion dollars companies continued to stick with Rails, maybe you were the problem?

I did read the article. One of the incidents was about their webhook worker(s) being swamped -- plus had errors due to deleted DB workloads that were necessary for the event to be processed. So I'd count that one as a slow endpoint attributed to Ruby on Rails (and it's famous for that).

And even if zero of their incidents alluded to performance problems with Rails I still worked a lot with it and I know for a fact that it's a factor.

Your snark doesn't change reality but you are free to pretend otherwise, fine with me.

> Also have you considered that if you had weekly outage when billion dollars companies continued to stick with Rails, maybe you were the problem?

Indeed, a programmer not having executive powers to influence change of deployment tech and server (was Puma at the time) is indeed me being a problem, surely. Especially after he made a study demonstrating the problems and calculated how much programmer time is wasted on these matters every week and he still got ignored. Perhaps I am the problem indeed!

> their webhook worker(s) being swamped

That's a capacity problem caused by a logic bug. Nothing stack specific. If you throw more work at a system than it is designed to handle, you'll hit a bottleneck.

> Your snark doesn't change reality

What reality? You are just barking your uneducated opinion. No one who ever worked on a service anywhere close to the scale of GitHub (regardless of the stack) would make such statements.

> However, many of these events caused exceptions in our webhook delivery worker because data needed to generate their webhook payloads had been deleted from the database. Attempting to retry these failed jobs tied up our worker and it was unable to process new incoming events, resulting in a severe backlog in our queues.

I bet you I could cause this bug on a Rust product if you let me near the code ;)

Oh, absolutely. It can happen everywhere -- in theory.

In practice however, I found people working with certain languages and stacks to be more thorough. Still largely depends on the person in the important position though, that much is always true.