Hacker News new | ask | show | jobs
by onebot 1493 days ago
I honestly love Ruby and Ruby on Rails, but I can't understand why companies like Shopify and Github go through so much effort to scale Ruby especially at their size. Maybe I am wrong, but couldn't this effort be put to rewriting parts of it in a more performant language like Go or Rust? One has to imagine that they have a large code base, how much developer time is spent writing Tests for Ruby? How much time was spent debugging odd monkey patching gems over the life of the codebase?

I do get that developer time was/is more expensive than servers. But I am not so sure at some level of scale. When you need 100 servers vs 5, and need to spend so much testing effort dealing with dynamic language, etc. And then you build custom compilers, special tools for tracing, entire architectures to deal with single threaded model, etc. Between Github & Shopify alone, they could have probably build a very Ruby on Rails like framework on a language more suited to the size and scale of these platforms.

7 comments

> they could have probably build a very Ruby on Rails like framework on a language more suited to the size and scale of these platforms.

I have a hunch they would rather have tens of thousands of other folks using a framework that has massive community support and folks other than them directly maintaining it.

Also being able to Google almost any problem in Rails and find multiple really good answers is worth needing 5 or 10 times more compute costs on just your application servers because dev time is expensive at any scale.

If you're paying 2,000 developers 150k+ a year that's 300 million dollars without accounting for anything that scales off base salary (bonuses, 401k matching, etc.). If you can save each developer 5 hours a week because of the Rails community existing that's 10,000 dev hours a week saved. An average person might work let's say 1,900 hours a year. That's roughly ~5.2 years of dev time saved from using Rails in opportunity costs and direct costs per week. Direct costs alone is ~$790k per week. I don't know what Shopify or a bigger place is paying on just application server costs but I'm guessing it's well worth hosting Rails instead of building their own framework in a more computationally efficient framework.

I think these numbers are really generous too. I'm guessing using Rails is saving a lot more than 5 hours a week of dev time per developer.

I keep hearing that argument my entire career... and I've attended at least 7 instances of it being very false.

The point is, nobody is even trying to do what you also renounce -- hence there's an inherent confirmation bias in the claims of "rewrites are hard" or "in-house frameworks fail". I think we should recognize that.

There are ways to surgically and gradually migrate away from a slow technology. I've done it several times in my career and my only fail was a failure of not knowing all business requirements -- lesson learned, never made the same mistake again after.

I am pretty sure in in orgs like GitHub and Shopify the business requirements can be gathered and catalogued. It's all in there.

Rule of thumb for “fully loaded” employee cost(inc benefits etc) is 2x salary. So double the cost side :)

As you say, engineering staff costs outweighs app server costs by so many multiples it’s crazy.

Now modern data eng/warehouse stacks on the other hand, those are money fires :)

> Now modern data eng/warehouse stacks on the other hand, those are money fires

No but I totally need a 100k+ contract (plus extra for runtime costs) to do…database stuff that…uhh, no other column based database is uhh, totally capable. Yeah. /s

Also we totally gotta buy these other data tools, because all of our “data” people refuse to learn even basic code, and can only write sprawling notebooks.

Oh plus gotta pay for some databricks/dbt to run my sprawling notebooks on huge machines because our Python based runtime is single threaded, can’t utilise the full CPU properly and requires silly amounts of memory, but it’s ok because “developer velocity” let us put an incomprehensible notebook into production a whole 15 seconds faster.

How much did they spend to write this compiler? Also, I would say they spend at least 5 or more hours per weak writing tests to handle all edge cases for any new code they commit due to the language being dynamic? Also, what kind of community is there for a custom compiler as well as the non-standard high performance things like using CRuby. I think it is easy to assume developer productivity is increased--and I think it is for small/medium projects. But at this scale and complexity, my assumption is that they are likely spending more development resources to support the language in their high performance environment.
Having a statically typed language doesn't mean you don't write tests.

Sure, there's a subset of tests you can avoid writing but you're still going to be writing a ton of tests with both language types. I also think this point gets blown out of proportion. For example I don't write a ton of boilerplate "what if I pass X type to Y variable" types of tests in Rails because I trust the database and Rails' validations. For example if I have a datetime field in a DB which is automatically defined as a datetime in Rails I'm not going to write a separate test for what happens if I pass in a string, integer, boolean, list, hash and so on. I'd write a test around making sure the date is within a specific range if I hooked up a validation rule to it to limit the range, but I'd write the same test with a statically typed language too.

Also, Stripe has a good write up[0] on how they use Sorbet to add type checking for Ruby and how they applied it to a multi-million line code base without disrupting developers too much. Basically a small team did most of the work, including writing the tool and migrating the code base. I haven't used it personally but it exists and can be used successfully at scale.

[0]: https://stripe.com/blog/sorbet-stripes-type-checker-for-ruby

The "parts of it" in your question might be the clue to an answer: you start implementing some often-used code path only to realize that they share code with all the rarely used code paths (permissions/authentication come to mind),

So you'd need to reimplement a whole bunch of your codebase, at which point you'll have two versions of everything that you need to keep in sync.

That's exactly why most COBOL rewriting projects in banking have failed miserably. It is much more complex than just rewriting a complex CRUD app. It is not just about the main program, it is also everything that connects to it and expect it to behave in super specific ways.
I mean yes it's an investment for sure but the whole YJIT team is 6 people. They are well compensated, lets say they cost 2.5 million dollars a year. That's still peanuts compared to a rewrite. Ruby is serving them well, they have tons of Ruby experts and their codebase is optimized etc, why would anyone want to ruin that?
But compare the long term costs. I think there have been countless examples of going from Ruby to Go and eliminating 80 to 90% of needed infrastructure. I can't begin to estimate their infrastructure costs, but if they are spending $1M/mo and you can eliminate 90% of that. That is not peanuts.
Their monolith is millions of lines of code, one of the biggest in the world, so I don't think some 2 year startup writing a blog "how we moved from Ruby to Go" is quite the same as what Shopify will have to do to rewrite their monolith. Add to that the cost of losing productivity since they have so many Ruby experts. This will be completely disruptive at a time where Shopify has to grow like crazy and add new features.

So I think they know what they're doing.

Also I'm somewhat skeptical of all those rewrite success stories, it seems to me like a CTO or principal engineer deciding to make the rewrite will never admit that it was a bad decision since his job depends on it. So of course he will make the case how great and beneficial the rewrite was. I bet there are many stories where the rewrite was detrimental to the business.

I can easily turn your argument exactly in the opposite direction:

Since there's always huge conservatism in relation to rewriting or making in-house frameworks then there's a confirmation bias in these stories: "whew, we dodged a bullet by not even trying thing X".

Yeah, everyone could have said that.

I've attended and participated in at least 7 successful rewrites. You don't hear about them though because people read HN and are like "I am not willing to engage with biased people so let's keep it to ourselves".

That's an aspect of these conversations that a lot of people around here don't account for: the people who get stuff done are quiet. This should be included in analyses but often isn't.

---

...And finally, millions of code in a monolith isn't that scary. Find a part that has minimal dependencies to everything else, rewrite it, put a reverse proxy in front of your service that points a particular endpoint to the new code, test for a bit, done. Rinse and repeat. The process itself is trivial, not especially creative, and mostly just laborious than anything else.

I've attended and participated in rewrites that were catastrophic. To each his own experience.

I do trust that Shopify know what they're doing, we're gonna agree to disagree here.

Sure, I have no problem agreeing to disagree. :)

I feel obliged to point out that "I trust that $BIG_COMPANY knows what they are doing" is very often in reality "There are gatekeepers inside the tech teams that are custodians of tradition". Been in plenty of companies and that's often the non-romantic truth.

I'd just end this by saying that a lot of teams don't make their calls in such a scientific and objective manner as you seem to imply. I wish that was the case but it's not what I've seen. Bad luck or me sucking at picking employers, I suppose.

This isn't a counterargument as such, but you could apply similar logic to, say, banks and COBOL? I'm not in banking, but I imagine there are many good reasons to keep writing and maintaining COBOL, and these companies may have similar stories to tell about Ruby.
COBOL and many legacy systems have the nasty problem of everything being inter-connected: a textbook example of spaghetti code.

So in such systems, in order to even begin properly, you have to rewrite a sizeable chunk of the legacy code before you could even showcase a first demo / MVP.

Needless to say, the relentless culture of our modern times makes the businessmen never approve such projects.

COBOL is a bit of a different beast :)

Very interest article, and related discussion, here: https://news.ycombinator.com/item?id=29221877.

> Maybe I am wrong, but couldn't this effort be put to rewriting parts of it in a more performant language like Go or Rust?

This is absolutely a valid point, and I'm slightly perplexed as well.

Pragmatic (and smaller) companies like GitLab have indeed approached the system by rewriting part of their product in more performant languages (Golang in GitLab's case).

Another pragmatic approach is Stripe's AOT compiler, which is (I suppose) much less resource-intensive to develop, and it's, in a way, the "optimize-the-bottleneck" approach, rather than trying to improve the performance of the whole language.

> they could have probably build a very Ruby on Rails like framework

This would cost a lot in terms of community and support, it would take a very long time to release, and it would also cost them a lot to migrate to.

All in all, it's perfectly possible that Rails itself is hard to optimize for, and that for large-scale monoliths, splitting out microservices (with moderation!) may be not be effective and/or efficient. But I'm still curious at why GitLab's (moderate!) microservice approach wasn't chosen for Shopify.

>I honestly love Ruby and Ruby on Rails, but I can't understand why companies like Shopify and Github go through so much effort to scale Ruby especially at their size. Maybe I am wrong, but couldn't this effort be put to rewriting parts of it in a more performant language like Go or Rust? One has to imagine that they have a large code base, how much developer time is spent writing Tests for Ruby? How much time was spent debugging odd monkey patching gems over the life of the codebase?

Me neither, especially there is crstal, a Ruby like syntax, that runs 3 times fast than go. Invidious, (youtube proxy) is written it it. Ruby developer ought to just switch to that for speed.

Crystal is not production ready. It doesn't even have production-ready parallelism (and there has been no discussion about it for a couple of years), so it can't be compared to Go.

I'd love to see large-scale adoption, but it's stuck in the typical vicious circle of no users <> little development.

Additionally, if they don't release parallelism support quickly, they'll be forever stuck with an ecosystem designed with the assumption that only one thread runs at a time¹. This was a deal-breaker for me, when I've evaluated it for use at my company.

¹=this is a very serious problem. Ruby has implemented parallelism (via Ractor), but the vast majority of the libraries assume a single thread running at a time; creating a project that uses parallelism will likely have many subtle bugs caused by the libraries.

I think GitHub at least is spinning off parts (not sure about Shopify) but as others are getting at, rewriting a big, business critical application is risky.

I worked at a place that was completely rewriting a PHP site in PHP (moving from a no framework no test spaghetti code base to something structured and tested) and that was a six year project just to get feature parity with the old site. During four of those years features were being added to both sites in parallel too which doubled the cost of every feature.