Hacker News new | ask | show | jobs
by iagooar 3425 days ago
I'm a bit curious here. Do you think that your issues with scalability and reliability have to do with your tech choice (I think it was Ruby on Rails)? Don't want to bash Rails, I'm just genuinely curious, since I come from a Rails background as well and have seen issues similar to yours in the past.
5 comments

It's not just the tech stack, but a combination of the technical choices made and with the human procedures behind them. We're actively pushing towards getting everybody to focus on scalability, but there's still a lot of debt to take care of.
You can check out their codebase here: https://github.com/gitlabhq/gitlabhq

Just looking at their gemfile is rather telling: a couple hundred gems. I've always felt that if you're going above 100, you should carefully consider how much your codebase is trying to achieve.

They're probably at the point where they really want to think about splitting off of their monolith codebase and into microservices.

Yeah, given how their ops situation is, I don't think that would be a good idea.
Maybe it's because I'm familiar with almost all of the gems, but I don't see anything wrong with their Gemfile. It's a pretty complex project, and they really do have a ton of integrations and features that need those gems.

There's probably a few small libraries that they could have rewritten in a few files (never a few lines), but what's the point? The version is locked, and code can always be forked if they need to make changes (or contribute fixes).

> (never a few lines)

You'd be surprised what you can do by carefully considering what the desired outcome actually needs to be.

Maybe there is justification for all the gems in gitlab's Gemfile, I didn't go through it with a fine tooth comb - but this reaffirms my experience that complex projects outgrow monolith codebases. Having an infrastructure outage take down your entire business is kind of a symptom of that.

> I've always felt that if you're going above 100, you should carefully consider how much your codebase is trying to achieve.

This is a mindset issue. Some communities reject NIH so strongly that you get the opposite problem that everything depends on hundreds of different developers. Gitlab can start some library forks with more stuff integrated, or change communities. Microservices is something that can't help, as all the dependencies will stay just where they are (Gitlab is already uncoupled to some extent).

But, anyway, most of those are stable¹, and I doubt many of Gitlab problems come from dependencies.

1 - They are unbelievably stable for somebody coming from the Python world. When I first installed Gitlab, I couldn't believe on how easy it was to get a compatible set of versions.

I see the opposite of NIH especially in the RoR/Ruby world and I don't think it's always a good thing. Developers reach for a library for one piece of functionality in a discrete area of the codebase when they could have achieved the same functionality with a few lines of code. That's not automatically NIH, that's being pragmatic about the dependencies you're bringing in and are going to need to support moving forward.
It is fairly large, but I still find it more organized than some examples I've seen.

Also, I don't see another very common issue with big gemfiles in that they don't seem to have multiple solutions of one thing in there (ie multiple REST clients, DB mockers, etc).

I've considered setting up gitlab locally, and have a couple of students that are trying to set it up on a vps. Customizing their bundle installer is... an interesting learning experience in managing complex * nix servers.

I think it's telling that their standard offering/suggestion for self-hosters is as complex as it is. While on the one hand I applaud the poor soul that maintains the script that tries to orchestrate five(?) services on a general, random, unix/linux server without any knowledge/assumption on what other things are running there -- it unsurprisingly falls over in "interesting" ways when you try to do radical stuff like install it on a server that runs another copy of nginx with various vhosts etc.

Now, running services like gitlab at "Internet scale" is far from trivial - but running it at "office scale" should be.

I fully understand how gitlab ended up where they are - but ideally, the self-host version should just need to be pointed at a postgresql instance, and be more or less a "gem install gitlab" -- or similar away - popping up with some ruby web-server on a high port on localhost -- and come with a five-line "sites"-config for nginx and apache for setting up a proxy.

I really don't mean to complain - it's great that they try to provide an install that is "production ready" -- but if the installer reflects the spirit of how they manage nodes on the gitlab.com side -- I'm surprised they manage to do any updates at all with little down-time...

For now I'm running gogs - and it seems to be more of a "devops" developed package - where deployment/life-cycle has been part of the design/development from the start. Single binary, single configuration file. Easily slips in behind and plays well with simple http proxy setups.

At some point I'll find a day or two to migrate our small install to gitlab (we could use the end-user usability and features) -- but I know I'll need to have some time for it. Time to migrate, time to test the install, time to test disaster-recovery/reinstall from backup... all those steps are slowed down and become more complex when the stack is complex.

(I'll probably end up letting gitlab have a dedicated lxc container, although I'll probably at least try to figure out how to reliably use an external postgres db -- it pains me to "bundle" a full fledged RDBMS. These things are the original "service daemons", along with network attached storage and auth/authz (LDAP/AD etc)).

LOL. GitHub is also a RoR shop.
It might be. I'm not saying it's impossible to scale Rails. It's just very, very hard. Github can do this, because they probably get the best of the best engineers. They even used to have their own, patched Ruby version.

Not everyone can afford that.

Why do you question Rails while the entire report is about Postgres only ?

And as someone working on one of the biggest and oldest Rails codebase out there, I can tell you that in term of scaling, Rails is the least of our concerns.

Sure it's not as efficient, so it's gonna cost you more in CPU and RAM, but it's trivial to scale horizontally. The real worry are the databases, they are fundamentally harder to scale without tradeoffs.

As for the patched Ruby, we used to have one too (but our patches landed upstream so now we run vanilla). It's not about allowing to scale at all. It's simply that once you reach a certain scale, it's profitable to pay a few engineers to improve Ruby's efficiency. If you have 500 app servers, a 1 or 2% performance gain will save enough to pay those engineers salary.

Depending on hundreds of gems means you are depending on the decisions of hundreds of developers with packages which are in constant churn.

Apps like Gitlab and Discourse that depend on hundreds of gems and require end users to have complex build environment and compile software are I think operating a broken user hostile model.

The potential for compilation failures, version mismatches and Ruby oddities like RVM is so gigantic with hundreds of man hours wasted one is left to conclude they may actually want to run a hosting business and not have users deploy themselves.

Compare that to Go or even PHP where things are so orders of magnitude simper that it is not even the same thing. To deal with this complexity you now have containers but have you solved the complexity or added another layer of complexity? There are technical but I think also social factors at play here.

Regardless of wether I agree or disagree with your critique, it has absolutely no relevance in the context of the current outage.

You don't like Ruby / Rails we get it. But that's totally out of topic.

I don't think it's that. GitLab IS a complex setup and Rails is not helping making it simple. There is a ton going on in the stack and the company only has limited resources.
It's not hard to scale a Rails server, when compared to other frameworks and languages. It's exactly the same as scaling a server written in Java, Node.js, Python, or any other language. You just spin up more machines and put them behind a load balancer.

Yes, Ruby is slower than other programming languages, but this usually doesn't matter. If you are charging people to use your software, or even if you are serving ads, you will always be making money before you need a second server. Plus, Rails is super productive, so you'll be able to build your product much faster.

I'm not sure why GitHub used a patched Ruby version, but no, that's not necessary.

Having said all of that, I'm moving towards Elixir and Phoenix. Not just because of the performance, but also because I really like the language and framework.

Nah this is just about having a robust backup system