Hacker News new | ask | show | jobs
by kstrauser 4667 days ago
How would a method 7 using generator expressions fare on the same system, like:

    return ''.join(`num` for num in xrange(loop_count))
On one hand, it avoids creating a temporary list in memory. On the other, it can't know in advance how long the final output of the loop will be and so couldn't use tricks like preallocating enough RAM.
2 comments

Interestingly enough, this is pretty much the example given for the timemit module:

http://docs.python.org/2/library/timeit.html

Results:

    $ python -m timeit '"-".join(str(n) for n in range(100))'

    10000 loops, best of 3: 40.3 usec per loop

    $ python -m timeit '"-".join([str(n) for n in range(100)])'

    10000 loops, best of 3: 33.4 usec per loop

    $ python -m timeit '"-".join(map(str, range(100)))'

    10000 loops, best of 3: 25.2 usec per loop
I just ran those with Python 2.7 (ASCII and Unicode) and Python 3.2 (Unicode). "-".join(map(str, range(100))) was always the fastest, with ASCII (2.7) < Unicode (3.2) < Unicode (2.7). Makes sense, but worth keeping in mind when porting code from Python 2 to 3.

Full results: https://gist.github.com/dbarlett/6479378

Does join scan the entire input to know the exact size and allocate that at the start? Doesn't it do the usual thing for resizing and that is to double the capacity? A generator will save a temporary list, it will however result in a lot of overhead for the generator switches. It might be beneficial if the input is really big.
Does join scan the entire input to know the exact size and allocate that at the start?

No, because that would make the algorithm quadratic again (one loop for the scan, second loop for the concatenation), and the whole point of the join idiom, as recommended by the Python documentation, is to avoid a quadratic algorithm.

Two loops are still linear as long as they are not nested ;-)
Oops, yes, good point. :)