Hacker News new | ask | show | jobs
by starburst 958 days ago
The first giveaway was the use of LINQ in a performance post
3 comments

This is a strange comment. LINQ is an extremely common way to write code in C#, and the performance of code that uses it is certainly relevant. Additionally, this is a performance comparison post. If the baseline uses LINQ but compares something else, the other tests should also use LINQ.
>LINQ is an extremely common way to write code in C#

It is extremely uncommon in performance contexts. It is actively discouraged and removed when writing performant C# code.

It is incredibly common in your "run of the mill" enterprise apps, or contexts where performance can slow down a bit for the sake of programmer happiness.

It isn’t uncommon when you know what it is doing. Wholesale removal or discouragement of LINQ is a sign of fake cargo-cult performance “optimization”.

It’s perfectly fine to use if you learn about how it works and how to use it properly.

LINQ is going to add overhead, regardless of "properly" using it or "cargo-culting" things; save the platitudes for the Monday zoom meeting.

LINQ adding overhead is a _technical reality_, it's how it works and that is fine. It's a fine tool in many difference contexts, but when we talk about performant code the context is obviously one in which every cycle matters.

And those of us with enough experience know that LINQ performance and implementation details varies over time in the runtime, and those shifts aren't always positive.

So when writing code where performance is fundamental to the success of the application, avoid LINQ since it WILL add overhead and it will remove implementation control from your team. It is a risk without much benefit when you're in the performance arena. That doesn't mean it's not useful in many other contexts.

There are cases where using the straightforward LINQ code would be a lot faster than a lower level alternative. For example when the code can be vectorized and use AVX instructions, which is implemented for quite a few LINQ methods. A straightforward non-LINQ version of the code would likely be slower as most developers would not or can't write the low-level AVX version.

I'd certainly be careful about LINQ in certain performance-sensitive code, e.g. about creating unnecessary copies of the data and allocating too much. But I would not trust myself without measuring to really know whether it actually makes a difference or if my "optimized" code might be even slower.

A lot of LINQ was also optimized and improved with the move to .NET Core (now just .NET). It's definitely important to actually profile the code, rather than just assume LINQ is slow/less-performant. Most of the time, unless the developer has added unneeded code (such as calling .AsEnumerable, or most anything that evaluates the entire collection), the difference between LINQ and standard iterator based code will be nearly non-existent, with some of the cases you mentioned where LINQ has been optimized beyond what a developer can do by hand.
> AVX instructions, which is implemented for quite a few LINQ methods

Are you sure? Any examples of such methods? And does AVX actually helps?

I don’t think that’s possible because IMO AVX and other SIMD can only help for dense inputs. The C# type is ReadOnlySpan, however ReadOnlySpan doesn’t implement IEnumerable and therefore incompatible with LINQ.

There’s even an alternative LINQ to workaround https://github.com/NetFabric/NetFabric.Hyperlinq but that thing is a third-party library most people aren’t using.

LINQ is pretty much frowned upon when programming games in C#, also when doing performance comparison you want to get as close as possible to the actual code without the extra overhead.

I would very much verify anything and not take it at face value when a C# performance post use LINQ.

LINQ codepaths are only getting faster. A literal army of engineers is focused on this stuff full time.

https://devblogs.microsoft.com/dotnet/performance_improvemen...

> dotnet/runtime#64470 is the result of analyzing various real-world code bases for use of Enumerable.Min and Enumerable.Max, and seeing that it’s very common to use these with arrays, often ones that are quite large. This PR updates the Min<T>(IEnumerable<T>) and Max<T>(IEnumerable<T>) overloads when the input is an int[] or long[] to vectorize the processing, using Vector<T>. The net effect of this is significantly faster execution time for larger arrays, but still improved performance even for short arrays (because the implementation is now able to access the array directly rather than going through the enumerable, leading to less allocation and interface dispatch and more applicable optimizations like inlining).

What are the chances that you'd have patience to write a competitive bug-free SIMD implementation?

Games in C# might be written in Unity, and a lot of those improvements wouldn't apply there. So in that context this might be accurate because it's an entirely different runtime.
The relationship between LINQ and performance is not trivial, it pretty much depends on what you do (more complex LINQ chain -> worse overhead).

It does have base cost (allocating iterator object(s)), but it's less than what you think, I have seen enough game code that does intermediate list allocations when it doesn't need to, which are far costlier than LINQ.

In addition, the benchmarks that do other positive work alongside the benchmarked aspect can sometimes be more illustrative and overall better because it is much more important how a particular approach works together with surrounding code, matching more closely real world scenarios.

And last but not least - in this case using structs yields additional advantage with LINQ since monomorphization of methods where generic arguments are structs has additional codegen quality benefits.

It is certainly possible to write slow code without LINQ, all I'm saying is that I wouldn't blindly trust a blog post that talk about performance and use LINQ.
The article contents suggest deep understanding of the topic.

This type of thinking ("LINQ bad" or "SOLID good") is one reason among many why bad patterns proliferate through the projects e.g. "hey you should rewrite this code with SOLID principles in mind" (without accounting for the context) or "This code calculates the sum using LINQ, you should rewrite it" (LINQ's Sum implementation uses SIMD and is hard to beat).

The article is fine, I was referencing the original article the article is referencing.
fwiw, it's possible to use LINQ in games no problem if you provide your own implementation(s) of the core LINQ methods you care about. I'm using LINQ and async/await in my game with no problems (~900fps, one short gen0/gen1 GC every 70 seconds or so) since I did the work to write zero-allocation versions of the basic operators like Select and Where. The design of LINQ is such that the C# compiler will use whatever implementation of the operator(s) is available when it converts your SQL-y queries into actual code.

I suspect the BCL doesn't include zero-allocation query operators because they generalize poorly, but I'm not sure. Zero-allocation query operators end up looking like 'ZeroAllocSelect<TEnumerable, TSource, TResult> Select (this TEnumerable seq, Func<TSource, TResult> func) where TEnumerable : IEnumerable<TSource>' which is obviously not trivial for the JIT to compile (or trivial to write)

The closures it creates for your queries are kind of a pain though. It's possible that will have improved in NET8 or NET9 because the allocation rules for delegates were recently revised to allow more optimizations, but I don't know if that was fixed.

I actually love the syntax if only it didn't alloc, is your implementation open source?
Wouldn't it be _more_ informative to see a "realworld" project, possibly built using LINQ, and the performance comparison done within the context of that project?
If what you want to measure is LINQ performance, sure, but in the context of measuring the language fundamental like class versus struct, it is an unnecessary overhead.

The article itself says it:

> However, the main reason why the benchmarks are not correct is because of LINQ and lazy evaluation.

Honestly I wouldn’t mind knowing that linq is fast when using one versus the other.