Hacker News new | ask | show | jobs
by arc776 2071 days ago
Just want to say thanks for these links, very interesting so far.

A point made in the video that seems to highlight the issue:

> Just adding two numbers requires 400 lines of code.

In compiled languages, this is one instruction! Think about the cache thrashing and memory loading involved in this one operation too. How can this possibly be fixed?

Python is a great language, but I don't know if it can ever be high performance on its own.

1 comments

Which compiled language adds 680564733841876926926749214863536422912 and 35370553733215749514562618584237555997034634776827523327290883 in one instruction?

FWIW, here's the relevant dispatch code in Python's ceval.c where you see it uses a very generic dispatching at that level, which eventually, deeper down, gets down to the "oh, it's an integer!"

        case TARGET(BINARY_ADD): {
            PyObject *right = POP();
            PyObject *left = TOP();
            PyObject *sum;
            /* NOTE(haypo): Please don't try to micro-optimize int+int on
               CPython using bytecode, it is simply worthless.
               See http://bugs.python.org/issue21955 and
               http://bugs.python.org/issue10044 for the discussion. In short,
               no patch shown any impact on a realistic benchmark, only a minor
               speedup on microbenchmarks. */
            if (PyUnicode_CheckExact(left) &&
                     PyUnicode_CheckExact(right)) {
                sum = unicode_concatenate(tstate, left, right, f, next_instr);
                /* unicode_concatenate consumed the ref to left */
            }
            else {
                sum = PyNumber_Add(left, right);
                Py_DECREF(left);
            }
            Py_DECREF(right);
            SET_TOP(sum);
            if (sum == NULL)
                goto error;
            DISPATCH();
        }
Python code can be made more high performance if there's some way to tell the implementation the types, either explicitly or by inference or tracing. That's how several of those listed projects get their performance.
Of course bigints require more than one instruction to add them, but even then you can reduce the work at compile time down to a series of integer operations, whereas the above code requires interpretting the program before it even gets to the add.

In your example text processing in `unicode_concatenate` is going to be very, very much slower than a bulk load of the native numerical data directly from memory and processing it. For each character, Python needs to check a number is still a number at run time then convert the result to a native numeric. I can only assume this string processing is at worst performed once and cached(?), because otherwise it doesn't seem like it would run well at all and surely Python's bigint performance is pretty important.

> Python code can be made more high performance if there's some way to tell the implementation the types, either explicitly or by inference or tracing.

At that stage, I would just use Nim and get better performance and a decent static type system included and either call it from Python, or call Python from Nim.

You did write "two numbers" ;)

Guess I could also have used 5j + 3 as a counter-example.

If this is an issue then at this stage, many Python people switch to use one of the alternatives mentioned here, like Cython, which is a Python-like language which includes a static type system (including support for C++ templates) and can easily generate C extensions that can call and be called from Python.

Note that the version of BINARY_ADD you're looking at is newer than what the talk referenced. The "fast path" for integer addition was removed which the talk still talked about. You can see a discussion about that linked in the comment of the code you pasted.