Hacker News new | ask | show | jobs
by tmsh 3787 days ago
> Over the past week, we have devoted significant time and effort towards understanding the nature of the cascading failure which led to GitHub being unavailable for over two hours.

I don't mean to be blasphemous, but from a high level, is the performance issues with Ruby (and Rails) that necessitate close binding with Redis (i.e., lots of caching) part of the issue?

It sounds like the fundamental issue is not Ruby, nor Redis, but the close coupling between them. That's sort of interesting.

3 comments

No the fundamental issue is that an application should not require any external service to boot.

It has nothing to do with Ruby, or Rails or even Redis. It's just a design flaw of the application, that you often learn the hard way.

I don't think that Ruby/Rails has anything to do with this, really. If you want to scale any app, you're going to want to do some caching somewhere. What this boils down to is that their app has a dependency in an initializer that depends on redis. Without a connection to redis, it will flap.
As someone with a fair bit of ruby+rails+redis experience, I don't think this is blasphemous, but I also don't think the performance issues of ruby/rails having anything to do with the failure. Generally you would cache/store something in redis not because your programming language or framework is slow, but because a query to another database is slow (or at least, slower than redis), or because redis data structures happen to be a good/quick way to store certain kinds of data.

I believe the fundamental issue was just that redis availability was taken for granted by app servers so that certain code paths/requests would fail if it wasn't available, rather than merely be slower.