| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by icambron 4671 days ago

> Wouldn't the above expression cause three sequential loops to run?

No, it wouldn't; that's the really important point about LINQ I was, clumsily, trying to express above [1]. Take this admittedly totally contrived example:

  someList
    .Where(i => i % 2 == 0)
    .Select(i => i + 7)
    .Take(5)

This is not equivalent to a bunch of sequential loops. What it is is a bunch nested Enumerators. Here's how it works. It gets the list's Enumerator, which is an interface that has a MoveNext() method and a Current property. In this case, MoveNext() just retrieves the next element of the list. Then Where() call wraps that enumerator with another enumerator [2], but this time its implementation of MoveNext() calls the wrapped MovedNext() until it finds a number divisible by 2, and then sets its Current property to that. That enumerator is wrapped with one whose MoveNext() calls underlying.MoveNext() and sets Current to underlying.Current + 7. Take just sets Current to null after 5 underlying MoveNext() calls.

So all that returns an enumerable, so as written above, it actually hasn't done any real work yet. It's just wrapped some stuff in some other stuff.

Once we walk the enumerable--either by putting a foreach around it or by calling ToList() on it--we start processing list elements. But they come through one at a time as these MoveNext() calls bring through the list items; think of them as working from the inside out, with each MoveNext() call asking for one item, however that layer of the onion has defined "one item". The item is pulled up through the chain, only "leaving" the original list when it's needed. The entire list is traversed at most once, and in our example, possibly far less: the Take(5) stops calling MoveNext() after it's received 5 values, so we stop processing the list after that happens. If someList were the list of natural numbers, we'd only read the first 10 values from the list.

Now, those nested Enumerator calls aren't completely free, but they're not bad either, and you certainly shouldn't be seeing a one second vs microseconds difference. If you craft the chain correctly, it's functionally equivalent to having all of the right short circuitry in the manual for-loop version, and obviously it's way nicer.

So why are you seeing such poor perf on your LINQ chains? Hard to say without looking at them, but a few of pointers are: (1) Never call ToList() or ToDictionary() until the end of your chain. Or anything else that would prematurely "eat" the enumerable. (2) Order the chain so that filters that eliminate the most items go at the end of the chain, similar to how you'd put their equivalent if (...) continue; checks at the beginning of your loop body. (3) Just be cognizant of how LINQ chains actually work.

[1] In the example in the parent, FindAll isn't actually a LINQ method, so there is one extra loop in there. Always use Where() if you're chaining; use FindAll() when you want a simple List -> List transformation.

[2] A detail elided here: each level actually returns an Enumerable and the layer wrapping it does a GetEnumerator() call on that.

1 comments

VexXtreme 4671 days ago

Thank you for this awesome explanation, it really clarifies how Enumerable methods work. I guess this is the real reason behind deferred execution. Still, I can't help thinking there is a big overhead involved with using such methods. We did some benchmarks in the past and the code that does the same thing manually always ended up being much faster.

The nice thing about Enumerable methods is that they can significantly speed up development and most projects won't suffer for it. However, for speed critical code it's probably not the best tool in the box.

link