Hacker News new | ask | show | jobs
by matthias509 1679 days ago
One point which I think bears mentioning with respect to performance is that even with O(1) access, the performance in the real world of a list implemented by a list of pointers vs a contiguous block of memory will be very different. This is because programs often iterate over lists and a list implemented by a contiguous block of ram will have MUCH better cache locality.

AFAIK, this unfortunately wouldn’t be possible to implement in Python though because I don’t think there would be any way to know how much space to allocate for each item in the list.

5 comments

In CPython at least, there's no other way. Everything is an object, even simple types like integers. And references to objects are implemented with pointers. Even a Numpy array which stores a contiguous array of simple values requires conversion to an object when you actually try to use those values.
For more efficient "lists" there are Numpy arrays https://numpy.org/doc/stable/reference/generated/numpy.array... which do have cache locality
Python does come with built-in arrays, but I've never seen them used, or heard anyone recommend them for anything.

https://docs.python.org/3/library/array.html

They are recommended when you want speed, but most people writing Python probably never need that much speed (or would probably want numpy anyway), and the few who do likely have more private sources to recommend it for specific occasions. It should probably be used for more things (instead of reaching for numpy at first chance) but is still a niche tool.
The times I've used array, it was a common data structure for communicating with a C extension. You could use a Python list, but that's not just slower but more of a pain to deal with on the C side. (This was before numpy was everywhere; I don't know how much fun that might be to work with from C.)

If you're going pure Python then space is a better reason than speed. I'm not sure an array is even as fast as a list.

I think I use struct much more when interfacing with C, but I can see array being useful for arrays specifically when you expect more than a couple of elements. Both are still quite niche though.
The standard array module was more relevant in the distant past, before version 2.6 added the bytearray type. Otherwise, numpy arrays are considerably more useful in virtually every circumstance.
I had never heard of them and when I try to Google Python array v set speed it just returns list v set. I did get the official documentation but it doesn’t go into speed.
These are useful when you need a pointer to an integer for a C or C++ API.
If you have numeric value you can use https://docs.python.org/3/library/array.html
But if you're in the real world you use numpy.ndarray instead (this underpins half of the Python ecosystem, more or less).
On a similar note: Has cpython implemented tagged pointers yet? I remember there being a lot more discussion about boxed types when Java sometimes boxed primitive types (at the expense of performance).

Why python gets away with it I don't know. Back when I still did python, integers below 256 and above -3 (?) were "cached" and had no boxing overhead. In 3.6 that was still the case. Either cpython lacks tagged pointers or has the worst tagging scheme in history.

Does it have to be one or the other? You could allocate blocks of memory for the list then when it will be filled up you allocate more and add a pointer to it.
This breaks O(1) access by index, because the objects in the list may have different sizes.