Hacker News new | ask | show | jobs
by josephg 858 days ago
Yeah. I recently worked on a small web project being developed at a university. The project is written in flask, and it presents a reasonably simple UI on top of some data living in a mysql database.

When I started on the project, page loads often took 10 seconds or more. The web application is used by about 20 people and that was enough to bring their single beefy server to its knees. Someone in NY tried scraping the site the other week and the site became completely unresponsive. They resorted to banning the IP to keep the website up. The reasons it was slow were all the usual culprits - a misused ORM being the main one.

It’s a nice language, but I really felt like I’d been transported back in time a few decades working in it. It feels like I’m using a computer from the 90s where performance choices matter again because the language is so slow. And where dependency management is a circus of half working tools and half hearted attempts at versioning. Packages conflict with one another. Some “pinned” package versions have apparently rusted and won’t actually install on my computer. And the system to install packages locally was obviously bolted on, badly, long after the horse had left the gate.

It reminds me of working in C in the early 2000s. I never thought I’d say this but it makes server side JavaScript with npm look positively modern and fast by comparison.

5 comments

That sounds like some quick kills to be easily made.

I use a dev machine that's quite archaic compared to a modern server, a 2nd gen i5 ThinkPad to be precise, that struggles to top 20ms for a request including loading a user and data object, joined tables and all, via ORM from Postgres running locally with a few hundred thousand records in said tables, before touching anything like explicitly adding caching.

Check your indexes, joins, general DB design and in-app looping. Flask's not your problem. You'll have equal or worse woes (if lower level with less hand holding) with anything else.

Yeah I spoke in past tense about the performance problems because, as you said, there were an awful lot of easy wins to be made. The site is about 2 orders of magnitude faster now, which is incredibly satisfying.

> Flask's not your problem. You'll have equal or worse woes (if lower level with less hand holding) with anything else.

I’m not so sure about that. It’s hard to run the experiment, but I’ve never seen a nodejs app run anywhere near that slowly. The default-synchronous nature of Python combined with its mediocre performance for straight code magnifies the impact of any bad design choices. At least in a nodejs application your server can happily run many sql queries at the same time, or do other work while it waits for the database. I’m sure sufficiently mediocre web server code can bring nodejs to its knees. But in a decade of working with node, I’ve never seen it done. Certainly not in a web app with only 20 users.

I once worked on a small web project, at a university, in Python, using WSGI IIRC. It loaded a lot faster than any of the big expensive apps the university had written.

Well, there was one exception. The little import statement to import the Oracle database client took maybe 15 seconds. MySQL for the win :)

(I would not recommend MySQL for new applications today, although I might recommend it over Oracle…)

Oracle EEE'd MySQL. MariaDB these days. Or postgres along with the rest of the singularity.
I have developed a lot of Python based websites (mostly Django), some quite complex, and I have very rarely seen anything that takes seconds to load - sometimes some database queries have been slow. In most cases load time is dominated by loading JS and images.

> The reasons it was slow were all the usual culprits - a misused ORM being the main one.

So, slow queries.

I have not has such bad issues with dependency management either. Not even with old stuff someone else wrote years ago.

I'm so confused by this, Python is really fast. This isn't to say that other languages aren't a lot faster but I can afford to be so ungodly wasteful with CPU bound tasks (on "leaf" programs don't worry I'm not doing this in libraries to be consumed by others) because it literally doesn't matter. The IO to call print(), write a log, or read a file on disk dwarfs the time actually spent running Python code and this is before using the new JIT.

I wouldn't number crunch in Python without something like numpy because you'll pay the cost of Python's dynamism for nothing but a lot of work has gone into making Python's primitives and standard library performant. I steal algorithms from CPython all the time.

Im curious what the orm misuse was because Python can obviously handle a lot higher loads than that. Perhaps the orm is to blame for offering some footgun. Or maybe the developer did something impossibly idiotic.
One big problem they had was that the system checked the user’s access permissions on every request. Access control in this application is quite complex, and so the access control code ended up issuing multiple queries and doing a lot of over fetching to do its job. (The classic ORM problem.)

It turned out that this was also happening for all static assets. Oops. And the site is covered in very small images. Double oops.

All told, to load a single page the server was making over 150 sql queries. And because it’s Python, those queries were all issued with blocking code. More than enough to keep the server busy for ages.

150 queries per page load is nothing. I'd check for missing indexes...
Woof! That's pretty rough.