Hacker News new | ask | show | jobs
by mixmastamyk 4860 days ago
Question:

    def squares(n):
        sq = []
        for i in xrange(n):
            sq.append(i*i)
        return sq

    A basically idiomatic version of the same in Python. No list
    pre-allocation, so every iteration through the list we have the
    potential to need to resize the list and copy all the data. That's
    inefficient.
Is that true? I'd expect .append() to change a pointer or two, not "resize and copy" the list. Even an .insert() should just move pointers at the C-level... no need to "defrag" it. I guess the key word is potential.
3 comments

Python lists are contiguous blocks of memory (i.e., arrays), not linked lists, so append will sometimes have to resize and copy. It's true that amortized worst case time is O(1), but that's still going to take longer than the one allocation involved in pre-allocating an array of known size. I don't know whether (or, under what circumstances) the difference is large enough to be significant, though.
Being fairly new to C, is appending to / dynamically growing an array really just a matter of "a pointer or two"?

How can you take for granted the memory space past the end pointer is available?

On an array, you can't. This means that you can't on a Python list, either. mixmastamyk is mistaken about the implementation details.

But if you assume that "list" means "linked list", then you can just navigate to the correct part of the list, allocate enough space for one new cell, and stitch together a few pointers. Allocation and stitching is O(1). In general, navigating to part of the list is O(n), but if your list is a doubly-linked circular linked-list, or alternately if you keep a pointer to the end as a special case, then "navigate to the end of the list" becomes also O(1). I assume that all of this is what mixmastamyk was thinking Python was doing.

Thanks for the detailed response.

I was in fact taking it as almost a given that Python lists were backed by arrays under the hood.

I believe you're correct - Python lists should be O(1) for appending[0].

[0] http://wiki.python.org/moin/TimeComplexity

I think you're both right and wrong.

mixmastamyk's comment implies that (s)he believes that Python lists are, under the hood, linked lists. This is wrong. Python lists are ultimately backed by C arrays. This is why get() and set() are O(1), and insert() is O(n).

However, dynamically resizing an array to support append operations, if you're not stupid, takes amortized constant time. Individual operations may be O(n). Python implementers, happily, are not stupid.

However, insert() operations do require "defragmenting".

So, the primary question mixmastamyk asked about the cost of

    def squares(n):
        sq = []
        for i in xrange(n):
            sq.append(i*i)
        return sq
is totally correct, but a lot of the sub-reasoning is wrong.
Yes, I've always just assumed that internally a resizable Python list was a linked-list I learned about in C... fits perfectly.

I suppose using an array must improve performance in typical cases, while the resizing (a linked-list advantage) happens less often.

Resizing is not a linked-list advantage, though. Resizing is an ammortized-constant-time operation.

The only advantage I'm aware of for linked lists is insert and delete (but not append and pop), which are constant time in a linked list but O(n) in an array.

The thing is, though, that doing insert() on a linked-list usually requires a seek first. Which is O(n) on a linked-list (and may be O(n), O(logn), or O(1) on an array, depending on what you mean by "seek"). So in _practice_, usually inserts and deletes are O(n) in a linked-list as well. (But not always, because you could already have a pointer to the relevant thing, for example if you have multiple different pointed-based data structures with pointers into each other.)

Basically, as far as I can tell, linked-lists are almost useless, unless you're forced to write all your data structures from scratch and you have too little time to write yourself a proper library. The main exception is for really hairy shit where you have multiple linked-list views traversing the same data in different orders.

linked lists are the default data structure underlying almost every implementation of Stacks and Queues and the variants of those. they're far from useless.

they're also such a flexible data structure that it is literally the ONLY data structure necessary to implement any of the LISPs.

> linked lists are the default data structure underlying almost every implementation of Stacks and Queues

Citation needed. I doubt this.

Certainly if I had only ten minutes to implement a stack or a queue, without access to anything more than stdlib.h (or equivalent), a linked list is easy to get right in a hurry, and only takes a few dozen lines. But the auto-resizing array is only a little harder, and has better performance for nearly every operation, as I explained in the previous post.

> they're also such a flexible data structure that it is literally the ONLY data structure necessary to implement any of the LISPs.

Of course. So what? That's not a reason to use them anywhere other than a school assignment that requires you to use them.