| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dr_zoidberg 3226 days ago

On Python 2.7.10:

    In [2]: %timeit a+b+c+d
    The slowest run took 6.66 times longer than the fastest. This could mean that an intermediate result is being cached.
    1000000 loops, best of 3: 247 ns per loop

    In [4]: %timeit "{}{}{}{}".format(a, b, c, d)
    The slowest run took 6.37 times longer than the fastest. This could mean that an intermediate result is being cached.
    1000000 loops, best of 3: 709 ns per loop

On Python 3.6.1:

    In [3]: %timeit a+b+c+d
    355 ns ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    In [4]: %timeit "{}{}{}{}".format(a, b, c, d)
    630 ns ± 38.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    In [5]: %timeit f"{a}{b}{c}{d}"
    21.8 ns ± 1.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Edit: that last %timeit made me suspicious, so I dug a bit deeper with the dis module:

    In [8]: import dis

    In [9]: def f1():
       ...:     return a+b+c+d
       ...:

    In [10]: def f2():
        ...:     return "{}{}{}{}".format(a, b, c, d)
        ...:

    In [11]: def f3():
        ...:     return f"{a}{b}{c}{d}"
        ...:

    In [12]: dis.dis(f1)
      2           0 LOAD_GLOBAL              0 (a)
                  2 LOAD_GLOBAL              1 (b)
                  4 BINARY_ADD
                  6 LOAD_GLOBAL              2 (c)
                  8 BINARY_ADD
                 10 LOAD_GLOBAL              3 (d)
                 12 BINARY_ADD
                 14 RETURN_VALUE

    In [13]: dis.dis(f2)
      2           0 LOAD_CONST               1 ('{}{}{}{}')
                  2 LOAD_ATTR                0 (format)
                  4 LOAD_GLOBAL              1 (a)
                  6 LOAD_GLOBAL              2 (b)
                  8 LOAD_GLOBAL              3 (c)
                 10 LOAD_GLOBAL              4 (d)
                 12 CALL_FUNCTION            4
                 14 RETURN_VALUE

    In [14]: dis.dis(f3)
      2           0 LOAD_GLOBAL              0 (a)
                  2 FORMAT_VALUE             0
                  4 LOAD_GLOBAL              1 (b)
                  6 FORMAT_VALUE             0
                  8 LOAD_GLOBAL              2 (c)
                 10 FORMAT_VALUE             0
                 12 LOAD_GLOBAL              3 (d)
                 14 FORMAT_VALUE             0
                 16 BUILD_STRING             4
                 18 RETURN_VALUE

    In [15]: %timeit f1()
    415 ns ± 30.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    In [16]: %timeit f2
    31.4 ns ± 1.45 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

    In [17]: %timeit f2()
    727 ns ± 43.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    In [18]: %timeit f3()
    344 ns ± 27.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

My understanding: in the %timeit f"..." example, a constant value for the expression must have been calculated before the execution of the timings. When wrapping things around a function, I'm forcing the interpreter to actually evaluate the f-string on every call, so it's a more apples-to-apples comparison. With the dis module I also verify that there is no constant expression fixed on every function.

Timings are now closer, but the f-string is still a bit faster.

Edit 2: (final edit?) still for string concatenation, "".join(...) is the king:

    In [19]: %timeit "".join([a, b, c, d])
    365 ns ± 39.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    In [20]: l = [a, b, c, d]

    In [21]: %timeit "".join(l)
    229 ns ± 21.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Notice that in [19] %timeit is also measuring the time to build the list in the first place, which is why I also tested with a prebuilt list in [21]. Building the list in the first place is a cost you have to pay, so it's not entirely free.

Still, all this optimizations make more sense for really long strings and fragments (substrings that you want to join). Extreme case of 256 1-long strings:

    In [43]: %timeit l[0] + l[1] + l[2] + l[3] + l[4] + l[5] + l[6] + l[7] + l[8] + l[9] + l[10] + l[11] + l[12] + l[13] + l[14] + l[15] + l[16] + l[17]
        ...:  + l[18] + l[19] + l[20] + l[21] + l[22] + l[23] + l[24] + l[25] + l[26] + l[27] + l[28] + l[29] + l[30] + l[31] + l[32] + l[33] + l[34] +
        ...: l[35] + l[36] + l[37] + l[38] + l[39] + l[40] + l[41] + l[42] + l[43] + l[44] + l[45] + l[46] + l[47] + l[48] + l[49] + l[50] + l[51] + l[5
        ...: 2] + l[53] + l[54] + l[55] + l[56] + l[57] + l[58] + l[59] + l[60] + l[61] + l[62] + l[63] + l[64] + l[65] + l[66] + l[67] + l[68] + l[69]
        ...: + l[70] + l[71] + l[72] + l[73] + l[74] + l[75] + l[76] + l[77] + l[78] + l[79] + l[80] + l[81] + l[82] + l[83] + l[84] + l[85] + l[86] + l
        ...: [87] + l[88] + l[89] + l[90] + l[91] + l[92] + l[93] + l[94] + l[95] + l[96] + l[97] + l[98] + l[99] + l[100] + l[101] + l[102] + l[103] +
        ...: l[104] + l[105] + l[106] + l[107] + l[108] + l[109] + l[110] + l[111] + l[112] + l[113] + l[114] + l[115] + l[116] + l[117] + l[118] + l[11
        ...: 9] + l[120] + l[121] + l[122] + l[123] + l[124] + l[125] + l[126] + l[127] + l[128] + l[129] + l[130] + l[131] + l[132] + l[133] + l[134] +
        ...:  l[135] + l[136] + l[137] + l[138] + l[139] + l[140] + l[141] + l[142] + l[143] + l[144] + l[145] + l[146] + l[147] + l[148] + l[149] + l[1
        ...: 50] + l[151] + l[152] + l[153] + l[154] + l[155] + l[156] + l[157] + l[158] + l[159] + l[160] + l[161] + l[162] + l[163] + l[164] + l[165]
        ...: + l[166] + l[167] + l[168] + l[169] + l[170] + l[171] + l[172] + l[173] + l[174] + l[175] + l[176] + l[177] + l[178] + l[179] + l[180] + l[
        ...: 181] + l[182] + l[183] + l[184] + l[185] + l[186] + l[187] + l[188] + l[189] + l[190] + l[191] + l[192] + l[193] + l[194] + l[195] + l[196]
        ...:  + l[197] + l[198] + l[199] + l[200] + l[201] + l[202] + l[203] + l[204] + l[205] + l[206] + l[207] + l[208] + l[209] + l[210] + l[211] + l
        ...: [212] + l[213] + l[214] + l[215] + l[216] + l[217] + l[218] + l[219] + l[220] + l[221] + l[222] + l[223] + l[224] + l[225] + l[226] + l[227
        ...: ] + l[228] + l[229] + l[230] + l[231] + l[232] + l[233] + l[234] + l[235] + l[236] + l[237] + l[238] + l[239] + l[240] + l[241] + l[242] +
        ...: l[243] + l[244] + l[245] + l[246] + l[247] + l[248] + l[249] + l[250] + l[251] + l[252] + l[253] + l[254] + l[255]
    27.9 µs ± 2.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

    In [44]: %timeit "".join(l)
    3.71 µs ± 197 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

2 comments

fnl 3217 days ago

So the conclusion is: Have an existing list to join? Use str.join(list); Otherwise, operator+ might be your friend, but f-strings are probably cleaner to handle (but you only get those in Py3.6+). As to str.format(): Nah, not good.

Didn't expect str.format() to turn out to be "so" poor, though.

But this conclusion is taken w/o looking into the "old" formatting approach, `"%s%s%s%s" % (a, b, c, d)`, though.

link

dom0 3226 days ago

The reason join is fast for these cases is because join basically converts all iterables to a list first and figures out how much it has to join exactly.

link

dr_zoidberg 3223 days ago

It walks the iterables (no need to convert them to lists), but its main trick is actually knowing how much memory it'll need for the end result, so no re-allocations happen, simply a one pass (through multiple iterables) copy.

link

dom0 3223 days ago

No, it really converts all iterables to lists (there is a utility function in the C API to do just that, because it's kinda hairy to do by hand. PySequence_ToList or so). Otherwise it would not be possible to calculate how much space is required beforehand, since iterables cannot be walked twice.

link