| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by asveikau 4891 days ago
	Reminds me of an experience I had. I kind of naively wrote something as a method that would "yield return" bytes. After all, everyone is familiar with that attitude so many people have, that you write what looks nicest and worry about bottlenecks later. I'm personally not usually too big on that attitude (I think it's often an overused excuse for obviously bad code) but "yield return" does let you write some very natural-looking stuff. So later I did compare it to a very simple for loop operating against a byte[], and had both versions work against a 10MB buffer (not unrealistic input in my use case). The throughput of the "yield return" code was something like 5 times less. I didn't try too hard to track down the precise cause of that at the time (maybe it was GC from lots of temporaries as you say? I was pretty CPU bound, and thinking it could have had more to do with a more straightforward loop generating code with fewer jumps after JIT), I just took the faster version.

2 comments

MichaelGG 4891 days ago

A for loop will have bounds checking removed, for one. Also, the JIT does better inside single method bodies. Compare:

  1: for(i = 0; i < arr.Length; i++) sum += arr[i];
  
  2: foreach(var x in generate(count)) sum += x;

In the first case, you end up with a small, relatively tight loop (the machine code has a lot of extra stuff I don't quite understand).

In the second case, you're literally doing a virtual call (and I don't think the CLR inlines interface calls) to get_Current() in a loop, followed by MoveNext(). So there's 2 function call overheads, not to mention the actual code that get_Current and MoveNext have.

Here's the sample program[1]. I get about 600% runtime for the enumerable version versus the array. Given all the extra work, I'm sorta impressed it's only 6x. The 32-bit JIT is a bit slower doing the array method than the 64-bit, which surprises me. Here's the code for the loops[2].

1: http://pastebin.com/GAg3RiNx

2: http://pastebin.com/index/a1eC8sX9

link

xanadohnt 4891 days ago

A "yield return enumerable" is "delay executed" so evaluating it multiple times causes the yield-return execution to occur at every evaluation (if you haven't taken care to ToArray() or ToList() it). Do something like:

  var myBytes = GetMyBytesItr();
  for (var i = 0; i < myBytes.Count(); ++i)
  {
    ProcessByte(myBytes.ElementAt(i));
  }

and you'll be in a world of hurt especially if GetMyBytesItr() allocates a memory buffer. Count() causes the "get the bytes data" action to occur, as does every iteration of ElementAt(). Now I'm not saying this is definitely what you were experiencing, but it's a common pitfall. Also, using ElementAt() for each iteration is, of course, completely contrived for this example (you'd want to foreach instead which would cause only one execution of the "get the bytes data" action).

link

asveikau 4891 days ago

That wasn't the issue in my case. It was foreach (var b in f()). f did not do any allocations, just a loop with a bunch of yield return statements.

This is why I suspected that it was simply worse machine code after JIT. But I wasn't sure of all the details of the code that was inserted on my behalf.

link

kevingadd 4891 days ago

One of the biggest differences is that if you do:

  foreach (var byte in byteArray)

The compiler transforms that into a normal for-loop that accesses the array directly. So that's already an advantage for the array over your enumerator.

The other difference that will have a major performance impact is the way the IEnumerator pattern works, even with generics. For:

  foreach (var b in f())

The generated code looks roughly like this:

  using (var enumerator = f())
  while (enumerator.MoveNext()) {
    var byte = enumerator.get_Current();
  }

As a result, you've gone from 0 method calls per iteration (direct array access) to 2 virtual method calls per iteration (MoveNext and get_Current). An incredibly smart JIT might be able to figure out that the virtual methods are always the same and turn them into static invocations, or even inline them, but I don't think the CLR can do this for you reliably.

link