Hacker News new | ask | show | jobs
by patio11 4124 days ago
This is a function of how Rails does concurrency, which is by using multiple processes. Each additional process consumes X MB. If you have something which needs a throughput of 100 requests per second, and each request takes 0.2 seconds, you require a minimum of 20 Rails processes and you probably want to have more.

The X is largely dependent on how many dependencies your app pulls in, on your garbage collection settings, and on how much state you carry around with you. I don't know what an off-the-shelf Rails app requires as a minimum, but as a comparable, BCC requires ~120 MB per process and AR requires approximately ~200 MB. Neither are particularly mammoth code bases.

Thus, if AR develops a business requirement for 100 requests per second on something which takes 200 ms per request, my server budget just increased by 4 GB of RAM. That is absolutely unavoidable in the Rails concurrency model. It doesn't matter how well you code the small thing at issue: you pay a memory tax based on desired maximum volume and app complexity, always.

(Neither here nor there, but AR has features quite similar to the notification feature that the app described in this article. We handle this by using queue workers -- which each require 200 MB, as described above -- and the decision that queues do not require immediate servicing. Instead, if thousands of calls are dumped into the system, we add them to a queue quickly and then consume the queue leisurely. Different apps are different apps, but I'd wager a guess that we do substantially more volume in terms of calls on ~400 MB of RAM.)

Also neither here nor there, but engineers are expensive and gigabytes are cheap. There exist many circumstances in which the architecture which requires 12 GB of RAM is a defensible engineering tradeoff versus the one which takes 5 MB. As Thomas told me once when I was asking him how to cut 100 MB off of Redis so that I could avoid upgrading a VPS by 1 GB: "How much does the extra gig cost? $10? Why are we even having this discussion?" And he was right.

3 comments

Also neither here nor there, but engineers are expensive and gigabytes are cheap. There exist many circumstances in which the architecture which requires 12 GB of RAM is a defensible engineering tradeoff versus the one which takes 5 MB.

This is a vital point — everybody wants to shoot for lovely, elegant systems that use minimal resources. But resources are astoundingly cheap; we've got a bunch of servers with 128GB of RAM, and you can get these for around £100 a month each. That's less than two hours of developer time. If using Ruby in place of another language saves you that much time—or even just makes you a little happier—it's probably worth it.

This is a nitpick, but this isn't entirely accurate:

"This is a function of how Rails does concurrency, which is by using multiple processes."

Multi-process vs multi-threaded is an application server architecture matter, not a Rails matter. Yes, your application must be thread safe, but this has been available in some form for Rails apps since a long time now [1]. The biggest problem in recent history has been Gems and their thread-safetyness (is that a word). The number of threaded Ruby application servers has been on the rise for a long while now, and even Passenger (traditionally the multi-process front-runner) is going threaded in version 4.0.

1: An interesting blurg post related to Config.threadsafe! (which was a huge source of confusion): http://tenderlovemaking.com/2012/06/18/removing-config-threa...

Thanks, you explained it very well.

Edit: Our requirement was to process this queue as fast as possible and that means more workers. With process based concurrency that is very costly as you have explained.

Yeah, everyone wants to process their queue as fast as possible but "as fast as possible" practically means a cap on the maximum allowed delay. Otherwise, why stop at 30 workers? Go for 300. 3000?

Also, if the workers shared all the code, you could have used unicorn to fork the processes after the code loading was complete. The 400MB per process would then instantly come down to something ~10MB per process at which point rewriting would have been delayed for another year or so.

As fast as twilio can accept and process without throttling, beyond that its not much useful.

Unicorn forking benefit is overrated, we used it and we don't see much benefit for long running processes.

Sidekiq is good alternative but that means some rewrite(for our app anyway). Secondly Sidekiq looks mature today, I started working on some of these changes 2 years ago.

Can you explain why using sidekiq involve a rewrite? AFAIK, using sidekiq you just have to make sure that your jobs are threadsafe not the whole app which is not very hard.

2 years ago, ruby was not COW friendly. So yeah, there was not much benefit to forking if you were using 1.9.3. Not sure how well does ruby 2.x fares in that respect

You have to make all code threadsafe that execute from Job or you have to decouple Job code and App code.(which probably be required anyway because I'm not sure sidekiq supports old rubies)

So, in any case you have to rewrite as much as code that I rewrote in Go and decoupled from main App. (its not lot of code, I mentioned in talk)

How is unicorn forking relevant in this context? Since they had memory usage problems with workers I assumed they were using resque(which uses forking)/delayed job
Did you try sidekiq?

Btw how are you generating the PDF from HTML and are able split a single HTML into multiple PDFs?

wkhtmltopdf and phantomjs both worked similarly, currently I'm using phantomjs.

And I'm not splitting pdf but splitting html generation work load, and then create individual pdfs from those html chunks. Then they will be joined together (using pdfunite). I found this much faster then joining html and generating large pdf.

Ok. Are you using phantomjs 1 or 2 ? Any reason to choose phantomjs or wkhtmltopdf? We are using wkhtmltopdf because it creates Table of Contents for PDFs and also clickable links
PhantomJS 1. But nothing is tide to phantomjs, wkhtmltopdf should work as well.

(I'm planning to test with PhantomJS 2)