Hacker News new | ask | show | jobs
by ilyagr 2072 days ago
I found the following talks by Armin Romacher very informative on these topics (C API, why python is more difficult to speed up than JS).

https://youtu.be/qCGofLIzX6g https://youtu.be/IeSu_odkI5I

I wish these were the things Python 3 addressed, rather than Unicode. I guess it's much more obvious in hindsight than back when Python 3 was designed.

4 comments

I would guess that, if Python 3 hadn't addressed Unicode, Python would never have come to a place where so many people are worried about its performance.

Python's still a great language for the things it was being designed for back in the 2000s. But adding decent Unicode support is a big part of what helped it become an attractive language for use cases where I wish it performed better or had better support for parallelism. Natural language processing, for example.

Any other example? Because there are lots of high-performance computing tasks where unicode couldn't matter less.
How do you square that assertion with the fact that people clung so hard to python 2 that it took the PSF 12 years to finally kill it?
Some people clung hard to 2. Others flocked to 3.

In my direct experience, the only people who waited until the bitter end (and beyond) were ops folks who never had to stray much outside of 7-bit ASCII, and companies with large existing codebases that didn't want to allocate the resources to migrating. Neither of those really have much to do with my assertion that Python 3 attracted new people doing new things.

Just want to say thanks for these links, very interesting so far.

A point made in the video that seems to highlight the issue:

> Just adding two numbers requires 400 lines of code.

In compiled languages, this is one instruction! Think about the cache thrashing and memory loading involved in this one operation too. How can this possibly be fixed?

Python is a great language, but I don't know if it can ever be high performance on its own.

Which compiled language adds 680564733841876926926749214863536422912 and 35370553733215749514562618584237555997034634776827523327290883 in one instruction?

FWIW, here's the relevant dispatch code in Python's ceval.c where you see it uses a very generic dispatching at that level, which eventually, deeper down, gets down to the "oh, it's an integer!"

        case TARGET(BINARY_ADD): {
            PyObject *right = POP();
            PyObject *left = TOP();
            PyObject *sum;
            /* NOTE(haypo): Please don't try to micro-optimize int+int on
               CPython using bytecode, it is simply worthless.
               See http://bugs.python.org/issue21955 and
               http://bugs.python.org/issue10044 for the discussion. In short,
               no patch shown any impact on a realistic benchmark, only a minor
               speedup on microbenchmarks. */
            if (PyUnicode_CheckExact(left) &&
                     PyUnicode_CheckExact(right)) {
                sum = unicode_concatenate(tstate, left, right, f, next_instr);
                /* unicode_concatenate consumed the ref to left */
            }
            else {
                sum = PyNumber_Add(left, right);
                Py_DECREF(left);
            }
            Py_DECREF(right);
            SET_TOP(sum);
            if (sum == NULL)
                goto error;
            DISPATCH();
        }
Python code can be made more high performance if there's some way to tell the implementation the types, either explicitly or by inference or tracing. That's how several of those listed projects get their performance.
Of course bigints require more than one instruction to add them, but even then you can reduce the work at compile time down to a series of integer operations, whereas the above code requires interpretting the program before it even gets to the add.

In your example text processing in `unicode_concatenate` is going to be very, very much slower than a bulk load of the native numerical data directly from memory and processing it. For each character, Python needs to check a number is still a number at run time then convert the result to a native numeric. I can only assume this string processing is at worst performed once and cached(?), because otherwise it doesn't seem like it would run well at all and surely Python's bigint performance is pretty important.

> Python code can be made more high performance if there's some way to tell the implementation the types, either explicitly or by inference or tracing.

At that stage, I would just use Nim and get better performance and a decent static type system included and either call it from Python, or call Python from Nim.

You did write "two numbers" ;)

Guess I could also have used 5j + 3 as a counter-example.

If this is an issue then at this stage, many Python people switch to use one of the alternatives mentioned here, like Cython, which is a Python-like language which includes a static type system (including support for C++ templates) and can easily generate C extensions that can call and be called from Python.

Note that the version of BINARY_ADD you're looking at is newer than what the talk referenced. The "fast path" for integer addition was removed which the talk still talked about. You can see a discussion about that linked in the comment of the code you pasted.
unicode absolutely had to be done. it'd be even more insane to leave strings as they were. maybe if you never venture outside of 7 bits it's only pain with negative ROI, but trust me the world has more languages than english and first-class support for unicode strings as just strings is a must. it was a painful transition but a necessary one. all other modern languages simply started there (and they're old enough to have a beer, too).
(OP is Armin)