Hacker News new | ask | show | jobs
by matthewmacleod 4124 days ago
Rewriting a monolithic platform as a set of microservices will almost always result in improvements from every perspective. You're replacing legacy code with new, cruft-free code; you're de-coupling parts of the system; you're building a more scalable design.

For example, if you're in the situation where you've gone from 12GB to 5MB of memory consumption, then you have a design issue and not a language one. That's not Ruby's fault.

We are also in the process of converting several monoliths into a microservice-driven system, but we are doing so with Ruby. It remains great language to use for building web apps, and I suspect some of the push back against it is caused by a dislike of the cliché giant Rails monolith that we've all come across, with hundreds of gem dependencies and so on. Rubby and Rails have a habit of encouraging such bad designs primarily through ease of use, but it's hard to see that as a problem with Ruby.

Go is a good tool for doing the same, and single-binary deployment is pretty magic. I'm sure it'll be supported for ages, so if you want to re-tool to use that then go for it! But Ruby's a viable option too, and it's important not to place blame for bad app design in the wrong place.

4 comments

> 12GB to 5MB of memory consumption, then you have a design issue and not a language one. That's not Ruby's fault.

I thought the same thing, I find that figure also very hard to believe. Perhaps I just lack experience, but if you get 12G of memory usage that drops to 5M after rewriting it (whether another language or not) seems to me as if they did something pretty wrong in their previous iteration.

This is a function of how Rails does concurrency, which is by using multiple processes. Each additional process consumes X MB. If you have something which needs a throughput of 100 requests per second, and each request takes 0.2 seconds, you require a minimum of 20 Rails processes and you probably want to have more.

The X is largely dependent on how many dependencies your app pulls in, on your garbage collection settings, and on how much state you carry around with you. I don't know what an off-the-shelf Rails app requires as a minimum, but as a comparable, BCC requires ~120 MB per process and AR requires approximately ~200 MB. Neither are particularly mammoth code bases.

Thus, if AR develops a business requirement for 100 requests per second on something which takes 200 ms per request, my server budget just increased by 4 GB of RAM. That is absolutely unavoidable in the Rails concurrency model. It doesn't matter how well you code the small thing at issue: you pay a memory tax based on desired maximum volume and app complexity, always.

(Neither here nor there, but AR has features quite similar to the notification feature that the app described in this article. We handle this by using queue workers -- which each require 200 MB, as described above -- and the decision that queues do not require immediate servicing. Instead, if thousands of calls are dumped into the system, we add them to a queue quickly and then consume the queue leisurely. Different apps are different apps, but I'd wager a guess that we do substantially more volume in terms of calls on ~400 MB of RAM.)

Also neither here nor there, but engineers are expensive and gigabytes are cheap. There exist many circumstances in which the architecture which requires 12 GB of RAM is a defensible engineering tradeoff versus the one which takes 5 MB. As Thomas told me once when I was asking him how to cut 100 MB off of Redis so that I could avoid upgrading a VPS by 1 GB: "How much does the extra gig cost? $10? Why are we even having this discussion?" And he was right.

Also neither here nor there, but engineers are expensive and gigabytes are cheap. There exist many circumstances in which the architecture which requires 12 GB of RAM is a defensible engineering tradeoff versus the one which takes 5 MB.

This is a vital point — everybody wants to shoot for lovely, elegant systems that use minimal resources. But resources are astoundingly cheap; we've got a bunch of servers with 128GB of RAM, and you can get these for around £100 a month each. That's less than two hours of developer time. If using Ruby in place of another language saves you that much time—or even just makes you a little happier—it's probably worth it.

This is a nitpick, but this isn't entirely accurate:

"This is a function of how Rails does concurrency, which is by using multiple processes."

Multi-process vs multi-threaded is an application server architecture matter, not a Rails matter. Yes, your application must be thread safe, but this has been available in some form for Rails apps since a long time now [1]. The biggest problem in recent history has been Gems and their thread-safetyness (is that a word). The number of threaded Ruby application servers has been on the rise for a long while now, and even Passenger (traditionally the multi-process front-runner) is going threaded in version 4.0.

1: An interesting blurg post related to Config.threadsafe! (which was a huge source of confusion): http://tenderlovemaking.com/2012/06/18/removing-config-threa...

Thanks, you explained it very well.

Edit: Our requirement was to process this queue as fast as possible and that means more workers. With process based concurrency that is very costly as you have explained.

Yeah, everyone wants to process their queue as fast as possible but "as fast as possible" practically means a cap on the maximum allowed delay. Otherwise, why stop at 30 workers? Go for 300. 3000?

Also, if the workers shared all the code, you could have used unicorn to fork the processes after the code loading was complete. The 400MB per process would then instantly come down to something ~10MB per process at which point rewriting would have been delayed for another year or so.

As fast as twilio can accept and process without throttling, beyond that its not much useful.

Unicorn forking benefit is overrated, we used it and we don't see much benefit for long running processes.

Sidekiq is good alternative but that means some rewrite(for our app anyway). Secondly Sidekiq looks mature today, I started working on some of these changes 2 years ago.

Can you explain why using sidekiq involve a rewrite? AFAIK, using sidekiq you just have to make sure that your jobs are threadsafe not the whole app which is not very hard.

2 years ago, ruby was not COW friendly. So yeah, there was not much benefit to forking if you were using 1.9.3. Not sure how well does ruby 2.x fares in that respect

How is unicorn forking relevant in this context? Since they had memory usage problems with workers I assumed they were using resque(which uses forking)/delayed job
Did you try sidekiq?

Btw how are you generating the PDF from HTML and are able split a single HTML into multiple PDFs?

wkhtmltopdf and phantomjs both worked similarly, currently I'm using phantomjs.

And I'm not splitting pdf but splitting html generation work load, and then create individual pdfs from those html chunks. Then they will be joined together (using pdfunite). I found this much faster then joining html and generating large pdf.

Ok. Are you using phantomjs 1 or 2 ? Any reason to choose phantomjs or wkhtmltopdf? We are using wkhtmltopdf because it creates Table of Contents for PDFs and also clickable links
I feel uncomfortable when reading your post. Passive aggressive, attacking the OP's architectural decisions, assuming that he didn't have any clue building a proper Ruby system. I am not a fan of Go, not at all, but I know that Ruby has significant issues which the OP also outlined quite well and you seem to just ignore them.

But Paul Graham knows exactly why you are sticking to the past (just replace 'Blub' by 'Ruby'):

As long as our hypothetical Blub programmer is looking down the power continuum, he knows he's looking down. Languages less powerful than Blub are obviously less powerful, because they're missing some feature he's used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn't realize he's looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.

When we switch to the point of view of a programmer using any of the languages higher up the power continuum, however, we find that he in turn looks down upon Blub. How can you get anything done in Blub? It doesn't even have y.

By induction, the only programmers in a position to see all the differences in power between the various languages are those who understand the most powerful one. (This is probably what Eric Raymond meant about Lisp making you a better programmer.) You can't trust the opinions of the others, because of the Blub paradox: they're satisfied with whatever language they happen to use, because it dictates the way they think about programs.

Source: http://www.paulgraham.com/avg.html

I certainly didn't intend to be passive aggressive or attack, apologies if it came over like that.

The point remains though; it's not that the poster doesn't know how to build a 'proper' system in Ruby, but that the frequent articles we see with the theme "we rewrote our system from language X to language Y and it's much better!" are rarely helpful, because (I can't stress this enough) architectural design is vastly more important than what language you are using to implement your system.

This leads to a dangerous bandwagon-jumping faddism where developers start jumping on to the next big thing, because they assume it will solve their problems. We saw exactly that with Ruby, for example; developers assumed that they could escape from the verbosity and enterpriseness of Java just by changing language, ignoring the pitfalls that could be experienced.

PG's 'blub' example is a very useful allegory, but it doesn't apply here. I'm not at all suggesting that Ruby is the perfect solution to all problems, or that using Go is wrong – I use both of them! Just that saying 'Go is better than Ruby for writing web apps because we reimplemented everything and it was faster' is not helpful.

this is a very good point. and this:

We saw exactly that with Ruby, for example; developers assumed that they could escape from the verbosity and enterpriseness of Java just by changing language, ignoring the pitfalls that could be experienced.

this worked for a lot of people until they got to the architectural stuff. but going from Java to Ruby (infinitely nicer) is distinct from going to J2EE to Rails (easier to get started, harder to keep going).

Please take this with its intended ;-)

So here we've got a Ruby/Blub programmer looking down the power continuum at Go and sees language deficiencies. He looks up the power continuum and sees maybe Haskell and Clojure etc. Since Go was created[`1] to be down the power continuum and contains nothing weird, new, or even particularly different, I don't think the adage holds.

[1] citation needed

Seriously, why do you think Go was created up the power continuum? It may be, but what features or idioms (barring channels) put it "up there?"

What is so magic about a compiler feature generally available in all compilers that target native code?
Magic in the sense that someone coming from building Ruby apps will find it amazing to skip over dealing with the deployment process for a scripting language.

while it's true that there are plenty of projects and services that reduce the pain of that, I'm certain that everyone who's ever built a Ruby app has experienced deployment hell at least once.

>Rewriting a monolithic platform as a set of microservices will almost always result in improvements from every perspective

That is one dangerous assumption you made there mr. Performance (in terms of response times, IO waits, etc) can be severely affected in distributed designs.

That is one dangerous assumption you made there mr. Performance (in terms of response times, IO waits, etc) can be severely affected in distributed designs.

That's why there's an 'almost' in there. You're right that distributed systems have a different set of challenges, but if you're developing a system that requires the kind of incredibly fast response times that will stress a distributed system, you're already into territory where general 'this is good guidance for web apps' advice doesn't apply to you.

Fair enough, you have a point there because I skipped the "almost" from your previous comment somehow.