Hacker News new | ask | show | jobs
by bradtgmurray 6344 days ago
Jeff is not quite on the ball with his reference to the "Shlemiel" algorithm.

Joel originally described the "Shlemiel" case when referring to strings in C not having a length stored in them, and needing to use strlen to walk the string to find the length. The Shlemiel problem comes up when you use strcat in a loop, since each strcat call needs to recalculate the length of the loop, which takes longer as the string gets longer.

The problem with concatenating strings is not a length issue, but a memory issue. In order to concatenate a string in C# or C++ using built in string classes, the location to store the new resulting string has to be allocated and the two strings have to copied into this new location. The allocation does not take up a trivial amount of time, and reallocating memory over and over again for each time you concatenate a string is bad practice. There is still a bit of "Shlemiel" since you have to keep copying your existing string, which takes a larger amount of time as the string gets longer, but you have to take in consideration the allocation, which wasn't present in the original problem that the "Shlemiel" story described.

2 comments

Memory allocation is something that is typically O(1), or O(log(n)). The constants attached are fairly large, but doing lots of allocations/deallocations isn't a BigO issue.

The schlemiel problem still exists when doing '+=' in a loop. Each copy requires some code to walk the length of both sides of the operator, same as strlen in Joel's example. The loop is an O(n^2) operation, even if it has a smaller constant on it than a memcpy(src, dest, strlen(src)).

It's "Schlemiel", but with different constants attached. For big enough N the O(1) or O(log(n)) of the allocations lose importance making the entire thing O(n^2) just like "Schlemiel".

So we may be saying the same thing, but, he didn't say allocation was N log N --- and allocation/deallocation is really expensive; it's consistently on the top of profiles of programs I work on.
... allocation/deallocation is really expensive ...

The cost of allocating memory in .NET (and probably other GC environments) is basically equivalent to the cost of incrementing a pointer, which is to say, basically free. The reason is because the .NET garbage collector periodically "packs together" objects in memory, so the memory allocation algorithm essentially becomes:

  result = _freeMemoryPtr;
  _freeMemoryPtr += numBytesToAllocate;
  return result;
... in other words, since objects in memory are periodically "packed together" by the GC, there isn't a lot of memory fragmentation, so the allocator can simply allocate from the end of the heap. It doesn't need to try to find a "large enough free block" within fragmented memory, because memory isn't fragmented.

At least, that is my understanding. I could be wrong because I haven't investigated this thoroughly yet.

http://en.wikipedia.org/wiki/Garbage_collection_(computer_sc...

You're right; I should have said, "malloc/free is really expensive".
Just curious, what do you do? It's interesting that you optimize a lot of C/C++ programs... game development maybe?
Network security software, reverse engineering, and (a bit earlier in my career) streaming video/multicast.

I got a 7x performance improvement in a performance-critical component at Arbor Networks within a couple weeks of starting there just by profiling, finding malloc/free at the top of the profile, and switching to an arena allocator. I have other malloc stories, but that's the simplest. I've learned to fear malloc.

My guess is something cryptographic.
The reference is entirely valid. Both Jeff and Joel are using the joke allegorically; Joel uses it to describe an algorithm that counts the length of the first part of the string over and over, while Jeff uses it to describe an algorithm that copies the first part of the string over and over. (The joke itself is an old one, not original with Joel.)