LINQ is better than foreach

I use LINQ as much as possible in my code. Any time I do any type of looping construct I re-evaluate whether I should be writing it using LINQ.

Most C# developers think LINQ is some fancy new concept but in reality they are just an implementation of Haskell list comprehensions ported to C# by Erik Meijer. I love watching the the imperative and functional languages converge like this.

DrJokepu 5924 days ago

You (and the writer of the article), of course, mean Linq to Objects. Linq providers are free to compile Linq structures to whatever they want to, in the case of objects that means compiling to list comprehensions, but for example in the case of Linq to SQL it all compiles down to SQL queries.

(Sorry for nipticking, the reason I'm pointing this out because people get confused about the unfortunate Linq terminology all the time).

kodefuguru 5924 days ago

IQueryable's get converted to expression trees which then get converted to sql. It's more of a "translated" to sql rather than a "compiled" to sql. The power of IQueryable is that the expression trees can be converted to so many different kinds of expressions, and you can write your own.

IEnumerable's extensions are mostly iterators over sequences.

btilly 5924 days ago

IQueryable's get converted to expression trees which then get converted to sql. It's more of a "translated" to sql rather than a "compiled" to sql.

You've just described how compilers work, minus optional attempts at optimization and optional excess passes. Most of which take place either after the expression tree has been generated, or somewhere during the conversion process. Therefore I'd call what you've described "compilation".

crc32 5924 days ago

Compilation is translation.

I find it somewhat ironic that you try to specify that the term LINQ be limited to LINQ to Objects and then proceed to explain how it doesn't matter whether the provider is for objects or SQL. Granted, his examples are all LINQ to Objects, but I see no need to be that specific. The provider could be Objects, SQL, Parallel Objects, etc. But the LINQ is still the same.

DrJokepu 5924 days ago

Let me clarify what I meant. Linq to SQL is compiled to SQL queries. SQL queries are not list comprehension, not even on server side (apart from trivial cases). I wasn't trying to suggest that the term "Linq" is limited to Linq to Objects, I was trying to say exactly the opposite.

I'm not following why you are trying to specify the providers implementation of the list comprehension when it doesn't matter to the LINQ query you write. LINQ is working on lists and provides standard query operators. When you write a LINQ query it's turned into an expression tree. Based on the provider of the original list the query is turned into an actual implementation of your LINQ query. Whether this is ultimately translated to a SQL query, an LDAP query, assembly instructions, etc. is of little concern to the writer of the LINQ query. Of course, there are some instances where you do care if performance is of importance.

I believe that the point of author was that any time you are iterating over a list, you should really think about whether you can replace the iteration with a LINQ query. It really doesn't matter whether the list being iterated is an underlying list of objects, a table in an RDBMS, etc.

nlawalker 5924 days ago

> When you write a LINQ query it's turned into an expression tree.

I believe that this is not the case if you are using LINQ to Objects unless you stick an explicit cast in there somewhere, but I could be wrong. I think that if you simply use the LINQ extension methods on an in-memory IEnumerable, they are simple method calls and involve no compilation to an expression tree.

Zak 5924 days ago

The Linq version is clearly superior to the imperative version, but I find the following even easier to understand:

  (into [] (map full-name
                (sort-by :last-name (filter #(= (:role %) 'developer) employees))))

It's good to see mainstream languages adding declarative constructs though.

but I find the following even easier to understand

Entirely subjective. To me, "#(= (:" is pure line noise.

Zak 5924 days ago

It would look that way if you don't know Clojure's quick and dirty function literal syntax. Another way to write it is:

  (fn [x] (= (:role x) 'developer))

I don't know Clojure. "(= (:" is equally line noise.

This is an absurd discussion. If you don't know C#, "x =>" is line noise. (More fairly, since (= (x y) z) is trivial to understand if you've ever seen any kind of Lisp, I should say instead -- if you don't know any C-like languages, "List<int> { 1, 2, 3 }; is line noise.)

I suggest you learn the language you program in; problem solved.

The point is that "easier to understand" is, in these cases, subjective enough, and entirely dependant on prior experience enough, to be absolutely meaningless.

rayvega 5924 days ago

I personally like in simple scenarios using List<T>.ConvertAll:

    var foo = new List<int>{ 1, 2, 3};
    var bar = foo.ConvertAll(x => x * 2);

over the the Linq equivalent:

    var foo = new List<int>{ 1, 2, 3};
    var bar = from x in foo select x * 2

that's the same as

var foo= new List<int> {1, 2, 3};

var bar = foo.Select(x => x*2);

nlawalker 5924 days ago

Yep, one thing to realize is that LINQ does not simply refer to the query syntax - it refers to a set of technologies, including expression trees and the ability to compile them to alternate languages like TSQL, the query syntax, and the extension methods like Select.

One small difference with your example though is that Select requires "using System.Linq", and ConvertAll is defined on List<T>.

bluetech 5924 days ago

And of course, the LINQ version returns an IEnumerable<TResult>, while the List version return List<TResult>. You could chain a .ToList() there to make them equivalent (if needed).

cema 5924 days ago

Clojure is compact and expressive, but there is a learning curve. Having data-centered and function-oriented programming features available for the mainstream is a great step forward. (Think mainstream programmers, not just mainstream languages.)

enntwo 5924 days ago

One small caution when using LINQ, especially in performance critical areas (XNA for instance): .Count and .Count() are dramatically difference in performance, when simply looking for the size of an array or list, be sure to use the .Count property as it is orders of magnitude faster. If you are using LINQ else where, .Count() will achieve the same behavior but will be much slower, so it can be a tough optimization to track down.

chrisb 5924 days ago

The .Count() method in LINQ to objects checks to see if the IEnumerable implements ICollection, and if so then just calls the .Count property.

Of course, calling the .Count property yourself is more efficient, but not by orders of magnitude.

enntwo 5924 days ago

Thank you for this, this was apparently fixed in .NET 4.0 which I was unaware of.

I think there is still some validity in my post in confusing whether you are using native accessing or routing through an additional tool/library, but the fix definitely elimates the performance hit I was discussing, so much of my post was in error.

edit: Also 3.5 it seems, I need to be more up-to-date with my concerns.

kodefuguru 5924 days ago

Everyone get's that wrong, including Microsoft's documentation ;). It's a good optimization though, in spite of the LSP violation.

MartinCron 5924 days ago

Similarly, it is very easy and intuitive and horribly inefficient to write something like:

int countOfThings = DatabaseRepositoryType.GetThings().Count();

As it will go to the database, pull all of the matching rows locally, create all of the objects, and then iterate through them just increment a counter. Creating a .GetCount() method which calls down to a SQL "SELECT COUNT()" query is more work to write, but runs about a thousand times faster.

chrisb 5924 days ago

This is not quite correct.

The SQL generated in this case depends on the return type of GetThings(). If it returns an IEnumerable<> then it is indeed a terrible thing to do, but if it returns an IQueryable<> then the SQL will be a SELECT COUNT() query, and no objects will be created.

Care does have to be taken when writing this sort of code!

boblol123 5924 days ago

The number of objects is far too small to provide a very meaningful result.

kodefuguru 5924 days ago

I changed it to run with one million names and ran with the optimized foreach. I only did one pass.

foreach 00:00:01.7348710 linq 00:00:01.8607973

zokier 5924 days ago

Yeah, I got that feeling too, when time measurements are in the microsecond range. I wonder how CPU cache and memory access patterns affect the results.

axod 5924 days ago

FWIW I find the LINQ far harder to read.

  var developerNames = employees.AsParallel().AsOrdered()
                              .Where(e => e.Role == Role.Developer)
                              .OrderBy(e => e.LastName)
                              .Select(e => e.FullName)
                              .ToArray();

That's just gross IMHO. AsParallel() yuck. AsOrdered() eugh why are these functions being used to set parameters.

enntwo 5924 days ago

Many people find function chaining more intuitive to read, or enjoy it more after being exposed to it for a period of time. The reason that functions are used to set flags or parameters is one of the basises of functional programming, which is to remove side effects.

In the above example, the state of employees is the same after the execution of the statement as it was before, if you were to set the "Parallel" flag or "Ordered" flag before hand each as their own assignment statement, then you would have modified the initial object and created a side effect.

I am not challenging your opinion, but simply answering your question as to why.

zokier 5924 days ago

I, for one, find that kind of function chaining somewhat unintuitive even while I understand the logic behind it. I except methods of an object to modify that object, not to return a modified version of the object.

I'm not sure if you find it easier to read, but this is analogous:

  var developerNames = from emp in employees.AsParallel().AsOrdered()
                       where emp.Role == Role.Developer
                       orderby emp.LastName
                       select emp.FullName;

axod 5924 days ago

Nope, if anything that's worse. But thanks, interesting to see another example.

I think it only looks gross because they're methods on an object, and you're used to that meaning "imperative mutable thing." Compare this:

    employees.AsParallel().AsOrdered()

to the more-traditional-C-style

    MaintainOrder(Parallelize(employees))

or the admittedly more attractive

    (maintain-order (parallelize employees))

and suddenly it seems perfectly sensible. I thought it was ugly first, too, but now I'm used to it.

wlievens 5924 days ago

How is

    (maintain-order (parallelize employees))

    MaintainOrder(Parallelize(employees))

"admittedly more attractive"? I enjoy functional constructs, but now you're just talking syntax right?

cema 5924 days ago

I agree, this is a strange claim. On the other hand (as long as we are with Clojure), something like

  (->> employees parallelize maintain-order)

might be said to be more attractive because it looks more linear. The sequential order of application is thus maintained, and the parentheses which are perceived as additional levels of hierarchy are removed.

Of course, the dot-dot-dot style in Java/C#/etc is doing the same:

  employees.Parallelize().MaintainOrder()

is also linear, the parentheses are only used to specify parameters.

I'm just being a smug weenie is all. Obviously there's not a real difference here. However, it's a better question as to whether this style or the "pipeline" style illustrated below and in the C# code is more pleasant.

axod 5924 days ago

I think one of the problems I have with it, is that AsParallel() doesn't seem to me like it should be a function. It should be an argument to a function. I dislike that.

Depends on perception. What about

    someFunction.Memoize();

That's perfectly natural, right? (I mean, you could claim that "is memoized or not" should be a "flag" on a function, but that's starting to sound a little silly to my ears.) It's exactly analogous.

cema 5924 days ago

Is it just the name? AsParallel versus ToParallel, for example?

rbanffy 5924 days ago

I find it interesting the developer has to worry about the parallelization. Shouldn't the data persistence layer (be it a relational database or some other data engine) deal with that (and use parallelization whenever possible)?

sparky 5924 days ago

Ideally, the tools should figure all this out. In practice, figuring out if there is enough work for parallelism to save you time is a hard problem, and programmer annotations help a lot. Additionally, PLINQ (parallel LINQ) in different than single-threaded LINQ in ways that can affect your program. For instance, it may return the results of a query in a different order than they were inserted into the collection. Some programs rely on the ordering in LINQ, so PLINQ is opt-in, rather than opt-out or tools-decide, at this point. There is more detail on this issue and some others here ( http://msdn.microsoft.com/en-us/magazine/cc163329.aspx ).

rbanffy 5924 days ago

Again, ideally, this decision should be passed down to the underlying datastore. It's only at that level that you have the reliable information on how the data is distributed among cores.

On what kinds of back-end do LINQ and PLINQ support parallelization?

sparky 5923 days ago

I'm not understanding what underlying datastore you're referring to in the case of LINQ-to-Objects or LINQ-to-XML. These frameworks perform "queries" on ordinary objects or XML data stored in memory within the .Net process, not on some other machine as with LINQ-to-SQL. You use LINQ-to-Objects in place of a filtering foreach loop, and LINQ-to-XML in place of XPath et al.

PLINQ is just a parallel implementation of LINQ-to-Objects and LINQ-to-XML. It works in the same scenarios .Net programs work in, i.e. multicores or SMPs running Windows (not sure if PLINQ works with Mono).

nlawalker 5924 days ago

I really like the LINQ extension methods and the functionality they provide, but I don't really like the query syntax. Method calls and lambdas seem so expressive and concise to me that I don't know why I'd start pretending like I was using TSQL to do operations on enumerables. I can definitely see its draw for some people though.

btilly 5924 days ago

Any time you see toy benchmarks, ask yourself what was left out. The potential performance issue with LINQ comes when you are accessing database data over a connection with non-trivial latency. Then it is easy to wind up accidentally issuing large numbers of queries, with a lot of round trips, that winds up being slow and unnecessarily hard on the database.

LINQ is hardly unusual in having this risk. It is an easy mistake to make in many environments with many toolsets, and it is not particularly hard to avoid it with LINQ. But the toy benchmark doesn't address it. And the article makes fun of a software architect who has likely had bad experiences with developers messing up on this.

sparky 5924 days ago

True for LINQ to SQL, not so much for LINQ to Objects as evaluated in the benchmarks. Of course, one toy benchmark over 1k objects doesn't fully characterize its performance, but at least we don't have to worry about the database latency free variable.

accessing database data over a connection with non-trivial latency. ... But the toy benchmark doesn't address it

I disagree entirely. I know when I'm using LINQ to SQL, and when I'm using LINQ to objects. My experience is that LINQ to objects is more common and very useful. He's using larger data volumes than I generally am, so there's absolutely nothing "toy" about it.

btilly 5924 days ago

Toy benchmarks frequently use large data volumes. That is easy to do. What is harder to do in a benchmark is providing realistically complex logic to go with those volumes. Many systems work well on the simple cases, but have bad edge cases that can get tickled by a more complex problem.

But the logic that I have successfully used LINQ on:

- operates on lists of objects, not database queries.

- operates on small quantities of data. It doesn't matter if there are only five items in a list. If you have to filter them, you have to filter them

- is not that complex.

We use foreaches as well, or a combination. LINQ is great for automating simple things, e.g. turn a loop to find an item into a .FirstOrDefault(x => x.SomeCondition). It would be interesting to see if, as our grasp of LINQ improves, we run into any of these supposed corner cases. But it hasn't happened yet.