Hacker News new | ask | show | jobs
by inglesp 4661 days ago
Note that since this article was written (2004) CPython performs an in-place optimisation for assignments of the form

    s1 += s2
There are details at point six of http://docs.python.org/2/library/stdtypes.html#sequence-type..., where it also says that str.join() is preferable.
2 comments

OK, so for fun I did a ran this on my machine (MacBook Pro, 2.53GHz, 10.8.4, 8GB ram, Python 2.7.2 shipped by Apple). I had to emulate the old timing module using code from [1].

    Method 1: 0.115 seconds
    Method 2: I gave up after >120s
    Method 3: 0.265
    Method 4: 0.160
    Method 5: 0.220
    Method 6: 0.098
I ran each one a few times to make sure the times were roughly correct. Method 6 is still the fastest, but the naive method one is really close. it obviously got optimized. Actually, they're all pretty close with the obvious (and hideous) outlier of using MutableString.

EDIT:

I just remembered I have an old version of PyPy (1.8) on my Mac. Thought I'd give that a try.

    Method 1: I gave up after >120s
    Method 2: I gave up after >120s
    Method 3: 0.090 seconds
    Method 4: 0.102
    Method 5: 0.430
    Method 6: 0.102
Method one is a problem again, and method 5 (the pseudo file) is noticeably slower. Otherwise the results aren't too far off.

[1] http://effbot.org/librarybook/timing.htm

Similar relative results, except that method 1 is always the fastest and even better than method 6. Ran loop count with large numbers (10 million & 30 million) to reduce measurement noise. Profiling was with cProfile on Core i3 2.53Ghz, 6 GB ram, Python 2.7.3 on Ubuntu 12.04

for 10M loop count, method 1 -> 1.599 s, method 6 -> 1.91 s

for 30M loop count,method 1 -> 4.967 s, method 6 -> 5.871 s

Summary: The KISS s1 += s2 always wins

Does PyPy have the same heuristic? If not, I wouldn't recommend relying on it.
If not, I would recommend submitting a bug to PyPy.
No it doesn't, and that heuristic isn't possible. It's based on looking at the reference counting and mutating an immutable object. It's pretty awful.
If GP is referring to CPython patch #980695, it appears to be available in PyPy, but requires an additional flag.

"We have it, just not enabled by default. --objspace-with-strbuf I think" [1]

[1] https://mail.python.org/pipermail/python-dev/2013-February/1...