Hacker News new | ask | show | jobs
by davidwtbuxton 3226 days ago
> Use format instead of + for generating strings — In Python, str is immutable, so the left and right strings have to be copied into the new string for every pair of concatenations.

It isn't always faster to use string formatting.

    $ python -m timeit -s 'a, b, c, d = "1234567890", "abcdefghij", "ABCDEFGHIJ", "0987654321"' 'a + b + c + d'
    10000000 loops, best of 3: 0.181 usec per loop
    $ python -m timeit -s 'a, b, c, d = "1234567890", "abcdefghij", "ABCDEFGHIJ", "0987654321"' '"{}{}{}{}".format(a, b, c, d)'
    1000000 loops, best of 3: 0.447 usec per loop
    $ python --version
    Python 2.7.13
2 comments

On pypy, + also wins for the 4x10 case:

  $ pypy -mperf timeit -s 'a, b, c, d = "1234567890", "abcdefghij", "ABCDEFGHIJ", "0987654321"' 'a + b + c + d'
  .........
  Mean +- std dev: 1.06 ns +- 0.04 ns
  $ pypy -mperf timeit -s 'a, b, c, d = "1234567890", "abcdefghij", "ABCDEFGHIJ", "0987654321"' '"{}{}{}{}".format(a, b, c, d)'
  ........
  Mean +- std dev: 45.8 ns +- 0.9 ns
  $ pypy -mperf timeit -s 'a, b, c, d = "1234567890", "abcdefghij", "ABCDEFGHIJ", "0987654321"' '"".join([a, b, c, d])'
  ........
  Mean +- std dev: 62.0 ns +- 4.8 ns
  $ pypy -mperf timeit -s 'a, b, c, d = "1234567890", "abcdefghij", "ABCDEFGHIJ", "0987654321"' '"%s%s%s%s" % (a, b, c, d)'
  ........
  Mean +- std dev: 78.3 ns +- 1.9 ns
On Python 2.7.10:

    In [2]: %timeit a+b+c+d
    The slowest run took 6.66 times longer than the fastest. This could mean that an intermediate result is being cached.
    1000000 loops, best of 3: 247 ns per loop

    In [4]: %timeit "{}{}{}{}".format(a, b, c, d)
    The slowest run took 6.37 times longer than the fastest. This could mean that an intermediate result is being cached.
    1000000 loops, best of 3: 709 ns per loop
On Python 3.6.1:

    In [3]: %timeit a+b+c+d
    355 ns ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    In [4]: %timeit "{}{}{}{}".format(a, b, c, d)
    630 ns ± 38.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    In [5]: %timeit f"{a}{b}{c}{d}"
    21.8 ns ± 1.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Edit: that last %timeit made me suspicious, so I dug a bit deeper with the dis module:

    In [8]: import dis

    In [9]: def f1():
       ...:     return a+b+c+d
       ...:

    In [10]: def f2():
        ...:     return "{}{}{}{}".format(a, b, c, d)
        ...:

    In [11]: def f3():
        ...:     return f"{a}{b}{c}{d}"
        ...:

    In [12]: dis.dis(f1)
      2           0 LOAD_GLOBAL              0 (a)
                  2 LOAD_GLOBAL              1 (b)
                  4 BINARY_ADD
                  6 LOAD_GLOBAL              2 (c)
                  8 BINARY_ADD
                 10 LOAD_GLOBAL              3 (d)
                 12 BINARY_ADD
                 14 RETURN_VALUE

    In [13]: dis.dis(f2)
      2           0 LOAD_CONST               1 ('{}{}{}{}')
                  2 LOAD_ATTR                0 (format)
                  4 LOAD_GLOBAL              1 (a)
                  6 LOAD_GLOBAL              2 (b)
                  8 LOAD_GLOBAL              3 (c)
                 10 LOAD_GLOBAL              4 (d)
                 12 CALL_FUNCTION            4
                 14 RETURN_VALUE

    In [14]: dis.dis(f3)
      2           0 LOAD_GLOBAL              0 (a)
                  2 FORMAT_VALUE             0
                  4 LOAD_GLOBAL              1 (b)
                  6 FORMAT_VALUE             0
                  8 LOAD_GLOBAL              2 (c)
                 10 FORMAT_VALUE             0
                 12 LOAD_GLOBAL              3 (d)
                 14 FORMAT_VALUE             0
                 16 BUILD_STRING             4
                 18 RETURN_VALUE

    In [15]: %timeit f1()
    415 ns ± 30.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    In [16]: %timeit f2
    31.4 ns ± 1.45 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

    In [17]: %timeit f2()
    727 ns ± 43.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    In [18]: %timeit f3()
    344 ns ± 27.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
My understanding: in the %timeit f"..." example, a constant value for the expression must have been calculated before the execution of the timings. When wrapping things around a function, I'm forcing the interpreter to actually evaluate the f-string on every call, so it's a more apples-to-apples comparison. With the dis module I also verify that there is no constant expression fixed on every function.

Timings are now closer, but the f-string is still a bit faster.

Edit 2: (final edit?) still for string concatenation, "".join(...) is the king:

    In [19]: %timeit "".join([a, b, c, d])
    365 ns ± 39.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    In [20]: l = [a, b, c, d]

    In [21]: %timeit "".join(l)
    229 ns ± 21.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Notice that in [19] %timeit is also measuring the time to build the list in the first place, which is why I also tested with a prebuilt list in [21]. Building the list in the first place is a cost you have to pay, so it's not entirely free.

Still, all this optimizations make more sense for really long strings and fragments (substrings that you want to join). Extreme case of 256 1-long strings:

    In [43]: %timeit l[0] + l[1] + l[2] + l[3] + l[4] + l[5] + l[6] + l[7] + l[8] + l[9] + l[10] + l[11] + l[12] + l[13] + l[14] + l[15] + l[16] + l[17]
        ...:  + l[18] + l[19] + l[20] + l[21] + l[22] + l[23] + l[24] + l[25] + l[26] + l[27] + l[28] + l[29] + l[30] + l[31] + l[32] + l[33] + l[34] +
        ...: l[35] + l[36] + l[37] + l[38] + l[39] + l[40] + l[41] + l[42] + l[43] + l[44] + l[45] + l[46] + l[47] + l[48] + l[49] + l[50] + l[51] + l[5
        ...: 2] + l[53] + l[54] + l[55] + l[56] + l[57] + l[58] + l[59] + l[60] + l[61] + l[62] + l[63] + l[64] + l[65] + l[66] + l[67] + l[68] + l[69]
        ...: + l[70] + l[71] + l[72] + l[73] + l[74] + l[75] + l[76] + l[77] + l[78] + l[79] + l[80] + l[81] + l[82] + l[83] + l[84] + l[85] + l[86] + l
        ...: [87] + l[88] + l[89] + l[90] + l[91] + l[92] + l[93] + l[94] + l[95] + l[96] + l[97] + l[98] + l[99] + l[100] + l[101] + l[102] + l[103] +
        ...: l[104] + l[105] + l[106] + l[107] + l[108] + l[109] + l[110] + l[111] + l[112] + l[113] + l[114] + l[115] + l[116] + l[117] + l[118] + l[11
        ...: 9] + l[120] + l[121] + l[122] + l[123] + l[124] + l[125] + l[126] + l[127] + l[128] + l[129] + l[130] + l[131] + l[132] + l[133] + l[134] +
        ...:  l[135] + l[136] + l[137] + l[138] + l[139] + l[140] + l[141] + l[142] + l[143] + l[144] + l[145] + l[146] + l[147] + l[148] + l[149] + l[1
        ...: 50] + l[151] + l[152] + l[153] + l[154] + l[155] + l[156] + l[157] + l[158] + l[159] + l[160] + l[161] + l[162] + l[163] + l[164] + l[165]
        ...: + l[166] + l[167] + l[168] + l[169] + l[170] + l[171] + l[172] + l[173] + l[174] + l[175] + l[176] + l[177] + l[178] + l[179] + l[180] + l[
        ...: 181] + l[182] + l[183] + l[184] + l[185] + l[186] + l[187] + l[188] + l[189] + l[190] + l[191] + l[192] + l[193] + l[194] + l[195] + l[196]
        ...:  + l[197] + l[198] + l[199] + l[200] + l[201] + l[202] + l[203] + l[204] + l[205] + l[206] + l[207] + l[208] + l[209] + l[210] + l[211] + l
        ...: [212] + l[213] + l[214] + l[215] + l[216] + l[217] + l[218] + l[219] + l[220] + l[221] + l[222] + l[223] + l[224] + l[225] + l[226] + l[227
        ...: ] + l[228] + l[229] + l[230] + l[231] + l[232] + l[233] + l[234] + l[235] + l[236] + l[237] + l[238] + l[239] + l[240] + l[241] + l[242] +
        ...: l[243] + l[244] + l[245] + l[246] + l[247] + l[248] + l[249] + l[250] + l[251] + l[252] + l[253] + l[254] + l[255]
    27.9 µs ± 2.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

    In [44]: %timeit "".join(l)
    3.71 µs ± 197 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
So the conclusion is: Have an existing list to join? Use str.join(list); Otherwise, operator+ might be your friend, but f-strings are probably cleaner to handle (but you only get those in Py3.6+). As to str.format(): Nah, not good.

Didn't expect str.format() to turn out to be "so" poor, though.

But this conclusion is taken w/o looking into the "old" formatting approach, `"%s%s%s%s" % (a, b, c, d)`, though.

The reason join is fast for these cases is because join basically converts all iterables to a list first and figures out how much it has to join exactly.
It walks the iterables (no need to convert them to lists), but its main trick is actually knowing how much memory it'll need for the end result, so no re-allocations happen, simply a one pass (through multiple iterables) copy.
No, it really converts all iterables to lists (there is a utility function in the C API to do just that, because it's kinda hairy to do by hand. PySequence_ToList or so). Otherwise it would not be possible to calculate how much space is required beforehand, since iterables cannot be walked twice.