I use LINQ as much as possible in my code. Any time I do any type of looping construct I re-evaluate whether I should be writing it using LINQ.
Most C# developers think LINQ is some fancy new concept but in reality they are just an implementation of Haskell list comprehensions ported to C# by Erik Meijer. I love watching the the imperative and functional languages converge like this.
You (and the writer of the article), of course, mean Linq to Objects. Linq providers are free to compile Linq structures to whatever they want to, in the case of objects that means compiling to list comprehensions, but for example in the case of Linq to SQL it all compiles down to SQL queries.
(Sorry for nipticking, the reason I'm pointing this out because people get confused about the unfortunate Linq terminology all the time).
IQueryable's get converted to expression trees which then get converted to sql. It's more of a "translated" to sql rather than a "compiled" to sql. The power of IQueryable is that the expression trees can be converted to so many different kinds of expressions, and you can write your own.
IEnumerable's extensions are mostly iterators over sequences.
IQueryable's get converted to expression trees which then get converted to sql. It's more of a "translated" to sql rather than a "compiled" to sql.
You've just described how compilers work, minus optional attempts at optimization and optional excess passes. Most of which take place either after the expression tree has been generated, or somewhere during the conversion process. Therefore I'd call what you've described "compilation".
I find it somewhat ironic that you try to specify that the term LINQ be limited to LINQ to Objects and then proceed to explain how it doesn't matter whether the provider is for objects or SQL. Granted, his examples are all LINQ to Objects, but I see no need to be that specific. The provider could be Objects, SQL, Parallel Objects, etc. But the LINQ is still the same.
Let me clarify what I meant. Linq to SQL is compiled to SQL queries. SQL queries are not list comprehension, not even on server side (apart from trivial cases). I wasn't trying to suggest that the term "Linq" is limited to Linq to Objects, I was trying to say exactly the opposite.
I'm not following why you are trying to specify the providers implementation of the list comprehension when it doesn't matter to the LINQ query you write. LINQ is working on lists and provides standard query operators. When you write a LINQ query it's turned into an expression tree. Based on the provider of the original list the query is turned into an actual implementation of your LINQ query. Whether this is ultimately translated to a SQL query, an LDAP query, assembly instructions, etc. is of little concern to the writer of the LINQ query. Of course, there are some instances where you do care if performance is of importance.
I believe that the point of author was that any time you are iterating over a list, you should really think about whether you can replace the iteration with a LINQ query. It really doesn't matter whether the list being iterated is an underlying list of objects, a table in an RDBMS, etc.
> When you write a LINQ query it's turned into an expression tree.
I believe that this is not the case if you are using LINQ to Objects unless you stick an explicit cast in there somewhere, but I could be wrong. I think that if you simply use the LINQ extension methods on an in-memory IEnumerable, they are simple method calls and involve no compilation to an expression tree.
This is an absurd discussion. If you don't know C#, "x =>" is line noise. (More fairly, since (= (x y) z) is trivial to understand if you've ever seen any kind of Lisp, I should say instead -- if you don't know any C-like languages, "List<int> { 1, 2, 3 }; is line noise.)
I suggest you learn the language you program in; problem solved.
The point is that "easier to understand" is, in these cases, subjective enough, and entirely dependant on prior experience enough, to be absolutely meaningless.
Yep, one thing to realize is that LINQ does not simply refer to the query syntax - it refers to a set of technologies, including expression trees and the ability to compile them to alternate languages like TSQL, the query syntax, and the extension methods like Select.
One small difference with your example though is that Select requires "using System.Linq", and ConvertAll is defined on List<T>.
And of course, the LINQ version returns an IEnumerable<TResult>, while the List version return List<TResult>. You could chain a .ToList() there to make them equivalent (if needed).
Clojure is compact and expressive, but there is a learning curve. Having data-centered and function-oriented programming features available for the mainstream is a great step forward. (Think mainstream programmers, not just mainstream languages.)
One small caution when using LINQ, especially in performance critical areas (XNA for instance):
.Count and .Count() are dramatically difference in performance, when simply looking for the size of an array or list, be sure to use the .Count property as it is orders of magnitude faster. If you are using LINQ else where, .Count() will achieve the same behavior but will be much slower, so it can be a tough optimization to track down.
Thank you for this, this was apparently fixed in .NET 4.0 which I was unaware of.
I think there is still some validity in my post in confusing whether you are using native accessing or routing through an additional tool/library, but the fix definitely elimates the performance hit I was discussing, so much of my post was in error.
edit: Also 3.5 it seems, I need to be more up-to-date with my concerns.
Similarly, it is very easy and intuitive and horribly inefficient to write something like:
int countOfThings = DatabaseRepositoryType.GetThings().Count();
As it will go to the database, pull all of the matching rows locally, create all of the objects, and then iterate through them just increment a counter. Creating a .GetCount() method which calls down to a SQL "SELECT COUNT()" query is more work to write, but runs about a thousand times faster.
The SQL generated in this case depends on the return type of GetThings(). If it returns an IEnumerable<> then it is indeed a terrible thing to do, but if it returns an IQueryable<> then the SQL will be a SELECT COUNT() query, and no objects will be created.
Care does have to be taken when writing this sort of code!
Yeah, I got that feeling too, when time measurements are in the microsecond range. I wonder how CPU cache and memory access patterns affect the results.
Many people find function chaining more intuitive to read, or enjoy it more after being exposed to it for a period of time. The reason that functions are used to set flags or parameters is one of the basises of functional programming, which is to remove side effects.
In the above example, the state of employees is the same after the execution of the statement as it was before, if you were to set the "Parallel" flag or "Ordered" flag before hand each as their own assignment statement, then you would have modified the initial object and created a side effect.
I am not challenging your opinion, but simply answering your question as to why.
I, for one, find that kind of function chaining somewhat unintuitive even while I understand the logic behind it. I except methods of an object to modify that object, not to return a modified version of the object.
I agree, this is a strange claim. On the other hand (as long as we are with Clojure), something like
(->> employees parallelize maintain-order)
might be said to be more attractive because it looks more linear. The sequential order of application is thus maintained, and the parentheses which are perceived as additional levels of hierarchy are removed.
Of course, the dot-dot-dot style in Java/C#/etc is doing the same:
employees.Parallelize().MaintainOrder()
is also linear, the parentheses are only used to specify parameters.
I'm just being a smug weenie is all. Obviously there's not a real difference here. However, it's a better question as to whether this style or the "pipeline" style illustrated below and in the C# code is more pleasant.
I think one of the problems I have with it, is that AsParallel() doesn't seem to me like it should be a function. It should be an argument to a function. I dislike that.
That's perfectly natural, right? (I mean, you could claim that "is memoized or not" should be a "flag" on a function, but that's starting to sound a little silly to my ears.) It's exactly analogous.
I find it interesting the developer has to worry about the parallelization. Shouldn't the data persistence layer (be it a relational database or some other data engine) deal with that (and use parallelization whenever possible)?
Ideally, the tools should figure all this out. In practice, figuring out if there is enough work for parallelism to save you time is a hard problem, and programmer annotations help a lot. Additionally, PLINQ (parallel LINQ) in different than single-threaded LINQ in ways that can affect your program. For instance, it may return the results of a query in a different order than they were inserted into the collection. Some programs rely on the ordering in LINQ, so PLINQ is opt-in, rather than opt-out or tools-decide, at this point. There is more detail on this issue and some others here ( http://msdn.microsoft.com/en-us/magazine/cc163329.aspx ).
Again, ideally, this decision should be passed down to the underlying datastore. It's only at that level that you have the reliable information on how the data is distributed among cores.
On what kinds of back-end do LINQ and PLINQ support parallelization?
I'm not understanding what underlying datastore you're referring to in the case of LINQ-to-Objects or LINQ-to-XML. These frameworks perform "queries" on ordinary objects or XML data stored in memory within the .Net process, not on some other machine as with LINQ-to-SQL. You use LINQ-to-Objects in place of a filtering foreach loop, and LINQ-to-XML in place of XPath et al.
PLINQ is just a parallel implementation of LINQ-to-Objects and LINQ-to-XML. It works in the same scenarios .Net programs work in, i.e. multicores or SMPs running Windows (not sure if PLINQ works with Mono).
I really like the LINQ extension methods and the functionality they provide, but I don't really like the query syntax. Method calls and lambdas seem so expressive and concise to me that I don't know why I'd start pretending like I was using TSQL to do operations on enumerables. I can definitely see its draw for some people though.
Any time you see toy benchmarks, ask yourself what was left out. The potential performance issue with LINQ comes when you are accessing database data over a connection with non-trivial latency. Then it is easy to wind up accidentally issuing large numbers of queries, with a lot of round trips, that winds up being slow and unnecessarily hard on the database.
LINQ is hardly unusual in having this risk. It is an easy mistake to make in many environments with many toolsets, and it is not particularly hard to avoid it with LINQ. But the toy benchmark doesn't address it. And the article makes fun of a software architect who has likely had bad experiences with developers messing up on this.
True for LINQ to SQL, not so much for LINQ to Objects as evaluated in the benchmarks. Of course, one toy benchmark over 1k objects doesn't fully characterize its performance, but at least we don't have to worry about the database latency free variable.
accessing database data over a connection with non-trivial latency. ... But the toy benchmark doesn't address it
I disagree entirely. I know when I'm using LINQ to SQL, and when I'm using LINQ to objects. My experience is that LINQ to objects is more common and very useful. He's using larger data volumes than I generally am, so there's absolutely nothing "toy" about it.
Toy benchmarks frequently use large data volumes. That is easy to do. What is harder to do in a benchmark is providing realistically complex logic to go with those volumes. Many systems work well on the simple cases, but have bad edge cases that can get tickled by a more complex problem.
But the logic that I have successfully used LINQ on:
- operates on lists of objects, not database queries.
- operates on small quantities of data. It doesn't matter if there are only five items in a list. If you have to filter them, you have to filter them
- is not that complex.
We use foreaches as well, or a combination. LINQ is great for automating simple things, e.g. turn a loop to find an item into a .FirstOrDefault(x => x.SomeCondition). It would be interesting to see if, as our grasp of LINQ improves, we run into any of these supposed corner cases. But it hasn't happened yet.
Most C# developers think LINQ is some fancy new concept but in reality they are just an implementation of Haskell list comprehensions ported to C# by Erik Meijer. I love watching the the imperative and functional languages converge like this.