Hacker News new | ask | show | jobs
by game_the0ry 626 days ago
I am fairly confident I will get some down votes for this, but here goes...

When I am trying to solve a technical problem, the problem is going to dictate my choice of tooling.

If I am doing some fast scripting or I need to write some glue code, python is my go-to. But if I have a need for resource efficiency, multi threading, non-blocking async i/o, and/or hi performance, I would not consider python - I would probably use JVM over the best python option.

Don't get me wrong, I think its a worthwhile effort to explore this effort, and I certainly do not think its a wasted effort (quite the opposite, this gets my up vote) I just don't think I would ever use it if I had use case for perf and resource efficiency.

5 comments

Surely your comment only really makes sense from the point of view of green field or hobbyist projects? If you were working for an organization with hundreds of thousands of lines of Python already doing something important, then your last sentence doesn't hold, right?
Sure, this works sometimes. But sometimes you have mountains of code and infrastructure dedicated to one platform, and it's worth the effort to round off the occasional square peg in the interest of operational simplicity/consistency.

I've been using ThreadPoolExecutors in Python for a while now. They seem to work pretty well for my use cases. Granted, my use cases don't require things like shared memory segments; I use as_* functions under concurrent.futures to recombine the data as needed. Honestly, I prefer the futures functions as I don't need to think about deadlocks.

> Sure, this works sometimes. But sometimes you have mountains of code and infrastructure dedicated to one platform, and it's worth the effort to round off the occasional square peg in the interest of operational simplicity/consistency.

I agree with this, this is a fair trade off, but not the direction I would go as a matter of preference.

What's the alternative, though?

1. Rewrite the whole thing 2. Carve out the high perf component into a separate system and also deal with the overhead of marshalling data between two different systems?

The main object is the in-between-land, where you need 10x-50x the performance of Python, not 500x the performance on some parallelizable workload.

And in many teams, just having to worry about python makes it easier to keep team members productive if they're not expected to handle several different languages productively.

Fair point. Tho I will push back on...

> And in many teams, just having to worry about python makes it easier to keep team members productive if they're not expected to handle several different languages productively.

I think this makes sense for individuals and teams, but for an org or company I think having specialist teams makes sense where teams that require perf use JVM and teams that make business-ware or devops or something not perf would use python.

JVM based development has a place for some teams. Others need to go all the way to C++/C/Rust etc for the performance they need.

But plenty of tasks can be done beautifully in Python. That's especially true in a data processing or ML setting where most of the heavy lifting is done in libraries such as numpy, spark, pytorch etc. (Also Python is the industry standard for such teams).

Still, even for such teams there are times where you want to do SOME heavier compute tasks within the core language, and offloading this to some other dev team simply doesn't scale.

The solution is to use multi processing instead of multi threading. But this workaround is quite inflexible.

Some dev teams may have developers that can deliver this in scala (especially if spark is involved). Other may have the ability to build C++ (or CUDA) libraries to add to the python environment.

But the ability to run somewhat heavier processing than what can be achieved by a single process is often much better.

Cost wise it also makes a lot of sense. Such steps may often find themselves on some large compute cluster where you have tens or hundreds of processors (or more) available. If a single step in a processing pipeline on such a cluster can be cut from 2 hours to 1 minute, it can be a large saving. Taking it from 1 minute to 10 seconds means a lot less.

Btw, and with all due respect, the part about teams that require perf using JVM doesn't really match my experience. Where I come from, the Java devs tend to produce the slowest code of all, mostly because every step of the processing is serialized/deserialized as microservices talk to each other for each data element.

Even python based code is often faster (sometimes by order of magnitudes). Partly because of cultural differences between teams (the python code, even when exposed as microservices, tend to work with larger blocks of code. And partly because the really is processed in C++ based libraries within python, that still have a significant edge on JVM based code.

Don't get me wrong: Java has a lot of advantages for many types of business applications, where the business logic complexity can be abstracted in well organized and standardized ways. But it's not typically the go-to language when seeking maximum performance in heavy compute or massive data volume scenarios.

It leads to other problems. A lot of orgs have specifically moved away from specialist teams towards teams united by the business mission.
The problem is that your example is not most of the companies that uses python are facing, which is the majority of the python code. They want some kind of performance uplift without rewriting the whole python code base. It's cheaper if python keeps getting some kind of upgrade

An example is facebook's php to hack compiler

The use case I wrote it in mind with is FastAPI. In that case, there wouldn't be any change to the Python code. You'd just use a different ASGI server that would use this sort of multithreaded event loop. So instead of running it with uvicorn main:app, you'd run it with alternateASGI main:app.

I have an example of a very basic ASGI server that does just that towards the end of the blog

Just playing devil's advocate here.

> They want some kind of performance uplift without rewriting the whole python code base.

In order take advantage of mutli-threading and/or async i/o, you would need re-write your code anyway, right? And at the point, wouldn't re-writing in different language be an option?

> you would need re-write your code anyway, right?

Heavily restructure, sure. Rewrite? Probably not.

Not really. The engineering effort of rewriting the entire codebase into a different language is astronomical. Besides mapping the logic to the new language, think about all the quirks between languages that you need to deal with. In the worst cases, you have to come up entirely new code. Nobody wants to pay for that.

Upgrading the language however it's way easier, and you usually have the official upgrade guide about what to do. It's also much safer, easier to deploy and test with.

Once we have the sane multi threading path in python, there would be even less incentive to rewrite the code

> Not really. The engineering effort of rewriting the entire codebase into a different language is astronomical.

Not to mention that python is actually a good language choice for many types of environments, and basically the industry standard for fields like ML/AI and supporting data pipelines.

Wherever python is used for heavy duty number crunching or large data volumes, most processing is handled by libraries written in other languages, while python is handling program flow and some small parts that need custom code. The large part can currently be quite expensive.

Migrating the whole codebase to another language for such setups would simply be absurd.

Still, for the small percentage of such codebases that DOES do semi-heavy data crunching, real multithreading would be nice so one can avoid resorting to multi-processing or implementing these parts as custom C++ libraries, or similar.

Hard agree. If you want resource efficiency and high performance you're probably better off looking to lower level languages most of the time. In my experience FastAPI usually gets used by teams that need a server done quickly and simply or are constrained by a lack of experience in low level languages. That being said, I do think its worthwhile trying to improve efficiency slightly even for these cases.