Hacker News new | ask | show | jobs
by pramodliv1 3772 days ago
I'm running Python + Django with uWSGI. Building the app is a pleasure. With type hints in Python 3.5, the code becomes almost as maintainable as statically typed languages too.

One problem I have is that the application is extremely CPU intensive. I can't get past 35-40 requests per second with 500 concurrent users (at 4 CPU cores, 14 GB RAM), which seemed too expensive economically. (I cached as much data as possible, both at the Nginx tier and with Redis) and tuned the number of uWSGI worker processes.

Do I have to try other languages or do you think I have more room for optimization with Python?

2 comments

Presuming you've actually profiled it to know its CPU intensive, or you're in a field where this 'goes without saying' (e.g. you're doing a bunch of math calculations within Python)....

Then I'd suggest you use Cython for a speed up or try using PyPy which can be 200% faster without any changes.

When I used Python and worked with financial data, every live algorithm would be recoded as a Cython extension.

Thanks for the suggestions!
Without a doubt you can serve more traffic than that with python, more than likely, the bottleneck isn't your language. There's a good chance if all you did was port your codebase to another language, you'd have the same basic usage profile, maybe +/-50%.

Having done web development for over a decade. My experience is that algorithm is far more important than language. After that data structures trumps language. After that doing work you don't have to do trumps language.

First look for really ugly nested loops in your code. I can't tell you the number of times this type of things ends up in your codebase, especially since dev environments often have a small subset of data. Even an O(n!) algorithm is fast for small values of n.

Next look to see what kinds of datastructures you're using. For instance, dicts are a seductive way to store data, they have a theoretical O(1) lookup, but they have a drawback of randomizing memory lookups, which means that for subsequent looks for a large dict, you'll end up with lots of cache misses, can make each lookup 1000 times slower. Meanwhile, a tuple has O(n) lookups, but if you're finding that you need to iterate over the data, you're going to benefit from memory locality. So know your tradeoffs. Also code that had one usage profile at launch, can often have a very different usage profile a year later, so it doesn't hurt to revisit this occasionally.

Last look for code that isn't doing anything. Are you calling the database twice in a row, asking for the same data, and not using it? It sounds obvious, but over time these kinds of things have a way of showing up in a codebase, and they can really add up.

I also should mention caching at this point. It can be incredibly hard to get caching right, but the performance savings are pretty big. It goes under the bucket of not doing things that you don't need to. Do whatever you can to do page caching. Putting an nginx cache in front of the webserver serving python for the majority of your pages could easily get you to hundreds of requests per second.