Hacker News new | ask | show | jobs
by lukehutch 2045 days ago
This is a straw man argument. Python's associative array code is written in C, not in Python, and Redis is written in C. So you're comparing C to C. The 40% loss in performance is due to Python being much slower at doing the stuff that's actually written in Python.
7 comments

Although cPython and Redis hashtables are both implemented in C. cPython's open addressing based hashtable[1] is far superior to the simple chaining based hashtable[2] in Redis. Python's performance is heavily dependent on the performance of its hash table implementation that it has been optimized over and over decades now.

[1]: https://github.com/python/cpython/blob/master/Objects/dictob...

[2]: https://github.com/redis/redis/blob/unstable/src/dict.c

Well of course it is calling C. The point of this argument is that you don't have to go all in and write your code in C to get good performance. You can just write good Python and get almost as fast code as C that will take you 5 times less time to develop.
>>You can just write good Python and get almost as fast code as C that will take you 5 times less time to develop.

In this case we're talking about a >2x slowdown.

The service also has fewer features.

You're talking about development time in a context where development time is largely irrelevant, as this is a service that people reuse and pick based on performance, because lower performance means higher costs to scale enough to meet requirements.

He is comparing a version 4 release of a product used by millions, and a POC I suspect of you took a look at early release of redis performance and features would be a lot closer.
OP specifically did a Python Vs C comparison. Your assertions have nothing to do with the claim behind discussed.

Even so, if you really want to discuss specific technical aspects, you should do well to keep in mind Python's notorious and widely known performance problems, such as those due to Python's GIL, and the fact that even in performance-oriented benchmarks Python lags way behind other languages, specially C

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

If it was easy or possible to mitigate Python's notorious performance issues then we would see the results in these synthetic benchmarks. But we don't, no matter how much time has been devoted to them.

Speaking as someone who writes Python for a living, Python aficionados would do well by avoiding misrepresent Python in ways that a) it's easy to verify and refute, b) goes against it's main technical characteristics. Python is awesome to chug out quick and dirty POCs, exploratory code, glue code that ties together performance-critical parts, utilities, and non-performance critical applications. Performance-critical applications is not, nor it ever was, Python's thing. Once we see Python addictions dos trying to claim that their hammer is the best screwdriver around, we start to sell a losing proposition.

>You can just write good Python and get almost as fast code as C that will take you 5 times less time to develop.

Sure, if someone has done the actual core work in C already that is.

Which, for uvloop, somebody already had - for a completely different purpose.

I think most people are usually quite surprised at how concentrated and how generic most hot paths can be. Theres a heavy power law distribution for the linesof code most code spends its time on.

This thing has all kinds of thread safety issues which if were actually addressed would make the implementation significantly larger, and slower. I'm not really sure what the point of this is other than to say that python dicts are pretty fast. But we already knew that
Redis itself is single-threaded though (or was, it's only "mostly" single-threaded now)
Yes, but the point is that you can only do that in some special cases like this one.
But isn't this exactly what the author is demonstrating? You can use the higher language of Python with its garbage collection, and since the critical parts are in C anyway, it ends up being competitive to the pure C implementation.
"competitive"

Right, so let me get this straight, the argument is:

You can implement an arbitrary system in python and it is 'fast enough' to be usable and 'better' in terms of 1) speed to develop, 2) lower complexity (ie. lower LoC, easier to maintain) and 3) the garbage collection doesn't matter?

I strongly disagree.

I've worked in python for a long time, and it's a great glue language... but, it's not suitable for implementing high performance systems. Flat out.

Not. Suitable.

If the system you're developing is a mild variant on 1) something that already exists and 2) is implemented in a lower level language, then yes, python is a reasonable glue language to link together native modules.

That's why many of the machine learning frameworks use python; because it's great at allowing you to express 'high level concepts' using low level primitives.

However.

It is not suitable for implementing low level primitives; because its too slow and single threaded.

So... you might argue that this redis implementation uses enough pre-existing code that someone else has written that it is reasonably performant, but... once you go beyond the 'trivial' implementation that uses someone else code, you'll find it's really not suitable for this kind of use-case.

I love python; but this is... it's just wishful thinking.

Just because you like python, does not make python suitable for every workload.

You say that Python is unsuitable here but isn’t that subjective?

60% of the speed of Redis with its core functionality could be more than suitable for some.

> It is not suitable for implementing low level primitives; because its too slow and single threaded.

Redis was single-threaded for much of its life. That didn’t stop it from excelling.

I’ll add that I also have worked in Python predominantly. One of the things that frustrates me about it are the packages that use lower-level language bases that need compilation during a ‘pip install’. Hunting down dependencies gets old pretty fast when you were expecting ‘just Python’. For those that are ok with that and the performance hit, Python can for sure be a suitable tool for use cases that would traditionally be tackled lower down the stack.

Not really. Python is sufficiently order-of-magnitudes slow, that it is not possible to implement pure python low level primitives.

There are no pure python low level primitives; everything is either a) wrapper, or, b) slow as hell and uses memory like a hog.

Python that wraps another language is a perfectly good way to doing things; but those low level primitives are never written in python.

...and neither are the low level primitives used in this case (—-> https://github.com/redis/hiredis).

I think we’re on the same page here with thinking Python wrapping another language is an alright way to do things.

There might be room for going lower with Python via interpreters like PyPy. Memory usage will still be high but speed will improve and you get the benefit of Python’s ease of use. For some that’s what matters most.

Personally I’m looking for a language that marries Python’s ease of composition and simple package building with typing that speeds up development in an IDE. I haven’t found that language yet. I’d be interested to hear any suggestions you have if you’ve gone down that road.

That's not how it's framed though. It mentions nowhere that most of the work is actually done in C and he literally states that "The aim of this exercise is to prove that interpreted languages can be just as fast as C" which is just incredibly misleading.

Or look at the start of the readme, he claims that he wants to disprove "some of the falsehoods about performance and optimisation regarding software and interpreted languages in particular". What falsehood exactly? To me it seems he intended it to be "interpreted languages are slow", but he doesn't disprove that at all.

Well, 40% slower isn’t exactly what I’d call competitive...
In what universe? People use Java, C#, Go, Node etc services that are 40% or more slower than equivalent C ones, and they're just fine with it...
Yes, it’s probably fast enough to be perfectly usable.

The point here is that implementing new features will be impossible, because this relies on an existing implementation that is not in python.

It is therefore not a proof that arbitrary high performance applications can easily be written in python.

It is simply an example of how high performance applications based on existing implementations someone else had written in another language can be wrapped in python.

>The point here is that implementing new features will be impossible, because this relies on an existing implementation that is not in python.

Isn't that the opposite? Implementing new features will be easier, because it uses the C backend a helper library (for parsing, eventing), so all the business logic is Python which is easier to extend.

This is also why people use embedded Python/Luc/etc in games, 3D programs, and so on.

If it was pointed out to their higher-ups that their infrastructure costs could be reduced by a sizable amount by changing language they might not be so fine with it.
The last thing most higher-ups care about are infrastructure costs and optimizations.

And in many cases you're the higher up, and startup founders take decisions to use slower but more flexible technologies every day...

No, because you'd rely on the fact that the time critical execution paths in your python application happens to be in C. This is clearly not true in general and therefore pydis just demonstrates a special case.
Not just that but hitting 50% or 60% of the target performance is usually not that difficult. It is always the last 10-20% of extra performance that are the ones really difficult to hit and the ones that might influence design decisions early on. Some of these frameworks really have tiny performance differences in the order of single digit percentages. Hitting 60% of any framework's performance is not really a feat.
What if you run it in pypy?
So, if my python program uses a lot of dictionary operation, I can claim that my python program practically written in C?
All the code of this project is in Python, so who cares about what it is using under the surface?

With that reasoning, you're actually comparing machine code to machine code. Why even compare the performance of any languages at this point, since you're comparing machine code to machine code in the end?

> All the code of this project is in Python, so who cares about what it is using under the surface?

Because this can be considered a special case, where the required critical logic is available as a library, which is not something that can be assumed as a general case in product development.