Hacker News new | ask | show | jobs
by jacob019 1649 days ago
I'm sick of everyone hating on python speed. IO is the bottleneck. By my estimate, 95% of applications are spending <5% of their response time executing python. Maybe your application is compute heavy and is better written in rust, but python gets shit done with minimal effort and the extra cycles are rarely an issue in most workloads.
3 comments

I write a LOT of Python. I’m not hating on it.

You can definitely outrun I/O but you can never outrun the GIL.

Sure, IO is often the bottleneck, but the python interpreter can most definitely add a lot of overhead, which can add a lot of operational costs. Personally, even in IO bound applications, I prefer something like Go or Elixir, and with those languages, it's not clear what python's productivity advantage is or is not (and I've known python since v1.5.2 so it's not a matter of familiarity).
We had a vendor who implemented their stack with Flask and Postgres on Debian. Their API is consistently slow (seconds to tens of seconds) to the point that we wrote our own app in Dotnet Core (running atop Postgres and Debian) that queries the available content once a day (500k rows of data) with minor refreshes hourly.

We take tens of milliseconds to query Postgres and generate a rendered HTML page for our clients. Showing this to the vendor's devs we got a very surprised response.

Admittedly, we do not operate at their scale, but I am certain this $5 a month droplet will keep running this app for a long time yet even with many users :)

Edit: I did write an MVP in Python atop Sqlalchemy wrapping Postgres, but the performance was still not ideal when rendering hundreds or thousands of rows of data, and the primary developer was already using Dotnet Core.

Bad code is bad code in in any language. Our ecommerce website jacobsparts.com is written in python on debian. It hits the db and rerenders with every page load. I dare you to call it slow.
Awesome stuff! What are some optimizations you have used, if you don't mind me asking? What's the underlying framework?
How big are those responses? I’ve seen terrible performance come down to serialization, which can be addressed by swapping in a fast serialization library like orjson (https://github.com/ijl/orjson). Though even then you’d probably have a hard time getting to tens of milliseconds. Other common culprits: poor indexing, n+1 queries.
At my last company, there was an existing product (an auto ml product for business users) built on Flask with a ton of serialization occurring to populate various charts in the GUI.

After I had left that specific team, I came back and swapped out the json serializer with orjson. It was like 5 lines of code if I recall. The performance skyrocketed. The GUI was noticeably far more responsive in populating the various charts and plots. By "noticeable" I mean it was loading in less than a 1/3 the previous time. Definitely recommend it. It's written in Rust, and it inspired me to start learning the language.

I’m not familiar with dotnet, but I’m not sure if blaming Python is the problem.

A more even comparable rewrite would have been FastApi with an asynchronous library for Postgres (such as SQL Alchemy or TortoiseORM).

There are probably ways to achieve similar results with Django or Flask, but it’s pretty easy with FastApi.

To any experienced Python dev, it's obvious from their description of what the problem is. And it's understandable that anyone inexperienced with Python would blame Python.

They were returning a large number of rows from Postgres (which, if the DB is properly set up, should take at most tens of ms: of course, depending on the width of the rows too), and most (well, I know of none that don't) Python ORM libraries (SQLAlchemy included) have a huge "serialization" cost (turning raw data from Postgres into objects). I've done a benchmark once, and things like Django-ORM or SQLAlchemy were like 10-50x slower than fetching tuples with psycopg directly. SQLAlchemy-core was fastest when fetching tuples if you wanted to not do raw SQL (IIRC, a performance penalty of at most 100%, translated to a factor, up to 2x slower), but Django's fetch-me-tuples functionality was also a single digit multiple of psycopg.

So, the solution to that problem is to fetch tuples, and then pass them in for rendering the page.

Of course, this also points at the problem with all the ORM implementations in Python: they are being too "smart" and dynamic for their own good (if all are bad at it, it also means that Python is not doing something good either, so criticism is warranted).