Hacker News new | ask | show | jobs
by tfm 3357 days ago
The idea is to reduce the amount of redundant copying of characters: you end up doing a few more concatenations in the outer loop, but the concatenations in the inner loop are of short strings.

Importantly, if you remove the restriction of the input list being "exactly 256 items", then the method is still quadratic.

A linear-time algorithm for this would copy each input character exactly once, which is effectively what the method based on array.tostring() does.

The chunk size of 16 is not as significant as the technique of constructing+concatenating chunks, although it is optimal for input length 256. In general I think you'd want a chunk size about the square-root of the expected input length, to minimise the number of copied characters.

EDIT: maths

Concatenating strings of length M and N is linear in O(M+N), because that's how many characters you're copying.

Number of characters copied if you construct a string of length N by concatenating one character at a time

  = (0+1) + (1+1) + (2+1) + (3+1) + ... + ((N-1)+1) + (N+1)
  = (N +1)*N / 2
Number of characters copied if you construct a string of length N by concatenating a chunk of length 16 each time

  = (0+16) + (16+16) + (32+16) + ... + ((N-16)+16)
  = 16*(0+1) + 16*(1+1) + 16*(2+1) + ...
  = 16 * (N/16 +1)*(N/16) / 2
         ^^^^^^^^^^^^^^^
This is where the "technique" comes in: although the algorithm is still quadratic you're effectively moving a constant factor out the front.

Note that you also have the cost of constructing the chunks each time, which becomes the dominant cost if you have too many chunks.

In general, if you have a length kM string which you construct from k chunks of length M, the number of characters copied

  = M * (k+1)*k / 2  +  k * (M+1)*M / 2
... which (rounding and integer constraints aside) is minimised for M = k, i.e. when the chunk size is the square root of the input length. Hence, for input length 256 we take chunk size 16.
1 comments

I see, that makes sense about reducing the constant. Interesting. Thanks for the great explanation.