Hacker News new | ask | show | jobs
by CodeCube 4845 days ago
About the only time that you have to be really careful with stuff like this is if you're writing something that's super sensitive to GC pauses, such as an XNA game. In those cases, yield return, linq, lambdas, some foreach loops will all generate short lived objects that will cause the GC to kick in more often. So if you're doing that in every update loop you could end up with performance issues. And even that is only on some platforms, as the desktop GC does a great job of dealing with short term garbage.

But in most cases such as ASP.NET MVC actions, or doing an API call in a desktop app, the maintenance and high level code simplification you can get by using these techniques far outweighs the low level "cost" of the code generated by the compilers.

It's good to know what's going on under the covers though.

2 comments

The C# compiler one-ups F# here, and will cache the delegate for lambdas, so long as they don't capture any locals (that is if it's "lifted") - so lambdas don't necessarily mean an extra object. (Although the LINQ methods need enumerables and enumerators.)
When did they introduce caching for lambdas? I've been caching them by hand since I used to see them pop up in CLR Profiler all the time. Is it unable to cache lambdas constructed in member functions because it can't be sure they don't close over 'this'?
It's been years, at least - maybe from the beginning? I just tried a small program with a lambda inside a member function. Works fine, generates IL like:

  IL_0001:  ldarg.0
  IL_0002:  ldsfld     class [mscorlib]System.Func`2<int32,int32> test.Program::'CS$<>9__CachedAnonymousMethodDelegate1'
  IL_0007:  brtrue.s   IL_001c
  // create and store delegate
  IL_001c:  Load delegate from field and call
   
F# will do neat stuff like completely eliminate the lambda, if you're only using it locally. It even does some constant detection across functions, so it can calculate constant functions at compile time. But if not, F# will create a new closure object every time.
Reminds me of an experience I had. I kind of naively wrote something as a method that would "yield return" bytes. After all, everyone is familiar with that attitude so many people have, that you write what looks nicest and worry about bottlenecks later. I'm personally not usually too big on that attitude (I think it's often an overused excuse for obviously bad code) but "yield return" does let you write some very natural-looking stuff.

So later I did compare it to a very simple for loop operating against a byte[], and had both versions work against a 10MB buffer (not unrealistic input in my use case). The throughput of the "yield return" code was something like 5 times less. I didn't try too hard to track down the precise cause of that at the time (maybe it was GC from lots of temporaries as you say? I was pretty CPU bound, and thinking it could have had more to do with a more straightforward loop generating code with fewer jumps after JIT), I just took the faster version.

A for loop will have bounds checking removed, for one. Also, the JIT does better inside single method bodies. Compare:

  1: for(i = 0; i < arr.Length; i++) sum += arr[i];
  
  2: foreach(var x in generate(count)) sum += x;
In the first case, you end up with a small, relatively tight loop (the machine code has a lot of extra stuff I don't quite understand).

In the second case, you're literally doing a virtual call (and I don't think the CLR inlines interface calls) to get_Current() in a loop, followed by MoveNext(). So there's 2 function call overheads, not to mention the actual code that get_Current and MoveNext have.

Here's the sample program[1]. I get about 600% runtime for the enumerable version versus the array. Given all the extra work, I'm sorta impressed it's only 6x. The 32-bit JIT is a bit slower doing the array method than the 64-bit, which surprises me. Here's the code for the loops[2].

1: http://pastebin.com/GAg3RiNx

2: http://pastebin.com/index/a1eC8sX9

A "yield return enumerable" is "delay executed" so evaluating it multiple times causes the yield-return execution to occur at every evaluation (if you haven't taken care to ToArray() or ToList() it). Do something like:

  var myBytes = GetMyBytesItr();
  for (var i = 0; i < myBytes.Count(); ++i)
  {
    ProcessByte(myBytes.ElementAt(i));
  }
and you'll be in a world of hurt especially if GetMyBytesItr() allocates a memory buffer. Count() causes the "get the bytes data" action to occur, as does every iteration of ElementAt(). Now I'm not saying this is definitely what you were experiencing, but it's a common pitfall. Also, using ElementAt() for each iteration is, of course, completely contrived for this example (you'd want to foreach instead which would cause only one execution of the "get the bytes data" action).
That wasn't the issue in my case. It was foreach (var b in f()). f did not do any allocations, just a loop with a bunch of yield return statements.

This is why I suspected that it was simply worse machine code after JIT. But I wasn't sure of all the details of the code that was inserted on my behalf.

One of the biggest differences is that if you do:

  foreach (var byte in byteArray)
The compiler transforms that into a normal for-loop that accesses the array directly. So that's already an advantage for the array over your enumerator.

The other difference that will have a major performance impact is the way the IEnumerator pattern works, even with generics. For:

  foreach (var b in f())
The generated code looks roughly like this:

  using (var enumerator = f())
  while (enumerator.MoveNext()) {
    var byte = enumerator.get_Current();
  }
As a result, you've gone from 0 method calls per iteration (direct array access) to 2 virtual method calls per iteration (MoveNext and get_Current). An incredibly smart JIT might be able to figure out that the virtual methods are always the same and turn them into static invocations, or even inline them, but I don't think the CLR can do this for you reliably.