Hacker News new | ask | show | jobs
by natedub 5351 days ago
Actually [x for x in xrange(..)] will produce only one list. In Python 2.X, range() returns a list while xrange() an iterable.

If you just need an iterable, itertools.repeat(0, 100) will do the trick.

1 comments

I suppose I didn't say it correctly, you're still maintaining two objects though (xrange the iterable, and the new list being built by the comprehension). The "optimization" I suppose is a moot point because Python's handling of lists is pretty solid and this isn't the 90's anymore, but at least with the while loop you've only got one list object and an integer (which is faster, I would think, to increment than it is to produce the next sequence in the iterable object?).

Anywho, splitting hairs. Goladus had the most helpful comment (I even learned something).

This is way premature optimization, and it's based on a lot of assumptions. Did you know that, on CPython, your integer increment creates a new object each time (outside of the range -5 to 255 or so, IIRC)? Now, xrange might do its own increment, so let's call it even on object creation.

If either xrange or list comprehensions are implemented in C instead of Python, do you still think your version will run quicker? What do you think the likelihood of either or both of these being the case is on your Python implementation? How many name lookups and function calls do you think each version does? Do you think you should find out before writing longer and more complicated code to attempt to out-perform it?

On my machine, your implementation performed ~3 times slower than the naive idiomatic "[0 for _ in xrange(100)]" and closer to 4 times slower when I bumped the list size up to 20000. And your version was ~32 times slower than "[0] * 100" and around 60 times slower when I bumped the list size up to 20000.

So please, don't optimize without measuring and instead just write idiomatic code the first time.

The code, for reference:

    def mk_list_1(size):
        ls  = []
        cnt = 0

        while cnt <= size:
            ls.append(0)
            cnt += 1

        return ls


    def mk_list_2(size):
        return [0 for i in xrange(size)]

    def mk_list_3(size):
        return [0] * size

    from timeit import timeit
    args = {
        "number":1000000,
        "setup":"from __main__ import mk_list_1, mk_list_2, mk_list_3"
    }
    print "Executing %i runs:" % args["number"]
    print "mk_list_1 took %i s" % timeit('mk_list_1(100)', **args)
    print "mk_list_2 took %i s" % timeit('mk_list_2(100)', **args)
    print "mk_list_3 took %i s" % timeit('mk_list_3(100)', **args)
Output on my machine:

  Executing 1000000 runs:
  mk_list_1 took 32 s
  mk_list_2 took 10 s
  mk_list_3 took 1 s
Fair enough, thanks for the comment.