| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by matthias509 1679 days ago
	One point which I think bears mentioning with respect to performance is that even with O(1) access, the performance in the real world of a list implemented by a list of pointers vs a contiguous block of memory will be very different. This is because programs often iterate over lists and a list implemented by a contiguous block of ram will have MUCH better cache locality. AFAIK, this unfortunately wouldn’t be possible to implement in Python though because I don’t think there would be any way to know how much space to allocate for each item in the list.

5 comments

mark-r 1679 days ago

In CPython at least, there's no other way. Everything is an object, even simple types like integers. And references to objects are implemented with pointers. Even a Numpy array which stores a contiguous array of simple values requires conversion to an object when you actually try to use those values.

link

killingtime74 1679 days ago

For more efficient "lists" there are Numpy arrays https://numpy.org/doc/stable/reference/generated/numpy.array... which do have cache locality

link

wodenokoto 1679 days ago

Python does come with built-in arrays, but I've never seen them used, or heard anyone recommend them for anything.

https://docs.python.org/3/library/array.html

link

uranusjr 1679 days ago

They are recommended when you want speed, but most people writing Python probably never need that much speed (or would probably want numpy anyway), and the few who do likely have more private sources to recommend it for specific occasions. It should probably be used for more things (instead of reaching for numpy at first chance) but is still a niche tool.

link

abecedarius 1679 days ago

The times I've used array, it was a common data structure for communicating with a C extension. You could use a Python list, but that's not just slower but more of a pain to deal with on the C side. (This was before numpy was everywhere; I don't know how much fun that might be to work with from C.)

If you're going pure Python then space is a better reason than speed. I'm not sure an array is even as fast as a list.

link

uranusjr 1679 days ago

I think I use struct much more when interfacing with C, but I can see array being useful for arrays specifically when you expect more than a couple of elements. Both are still quite niche though.

link

kgm 1679 days ago

The standard array module was more relevant in the distant past, before version 2.6 added the bytearray type. Otherwise, numpy arrays are considerably more useful in virtually every circumstance.

link

edgyquant 1679 days ago

I had never heard of them and when I try to Google Python array v set speed it just returns list v set. I did get the official documentation but it doesn’t go into speed.

link

cozzyd 1679 days ago

These are useful when you need a pointer to an integer for a C or C++ API.

link

Znafon 1679 days ago

If you have numeric value you can use https://docs.python.org/3/library/array.html

link

kzrdude 1679 days ago

But if you're in the real world you use numpy.ndarray instead (this underpins half of the Python ecosystem, more or less).

link

bjoli 1678 days ago

On a similar note: Has cpython implemented tagged pointers yet? I remember there being a lot more discussion about boxed types when Java sometimes boxed primitive types (at the expense of performance).

Why python gets away with it I don't know. Back when I still did python, integers below 256 and above -3 (?) were "cached" and had no boxing overhead. In 3.6 that was still the case. Either cpython lacks tagged pointers or has the worst tagging scheme in history.

link

seiferteric 1679 days ago

Does it have to be one or the other? You could allocate blocks of memory for the list then when it will be filled up you allocate more and add a pointer to it.

link

dmurray 1679 days ago

This breaks O(1) access by index, because the objects in the list may have different sizes.

link