| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pjmlp 3065 days ago

Here is a list, not easy to track all of them down, but maybe as keywords to easy googling.

- structs

- unsafe code

- stack allocation in unsafe code (think alloca())

- attribute annotations for packing and inline calls across assemblies

- ref parameters

- ref returns

- readonly ref parameters

- Span<> and Memory<>

- Native memory allocation via MarshalInterop, SafeHandles

- Buffer and ArraySegment

- SIMD (with RyuJIT)

- Profiled code cache for JIT code background generation between executions (System.Runtime.ProfileOptimization)

2 comments

celdon25 3065 days ago

Great list. It's important to understand when to use each one of these. Identify your bottleneck, through the use of profilers. Execution time is largely based on memory bus blocking I/O and not the CPU calculations, so if you start with writing SIMD, you're not going to get anywhere.

Accessing data on the stack instead of the heap is the #1 saver of execution time, in my experience. But your bottlenecks might be different. Locally scoped value-type variables are generally on the stack. Object-scoped and static fields and properties are on the heap.

Writes to local variables seem to be faster than reads, IIRC. The fastest operators seem to be the bitwise instructions, IIRC. If running in 32-bit mode, try to work with 32-bit integers. If running in 64-bit mode, try to work with 64-bit integers.

Here's an example of a major, major improvement in performance

for(int x = 0; x < this.Width; x++)

{

   for(int y = 0; y < this.Height; y++) { foo = bar; }

}

Much faster version (due to storing a copy of Width and Height on the stack instead of the heap):

int width = this.Width;

int height = this.Height;

for(int x = 0; x < width; x++)

{

   for(int y = 0; y < height; y++) { foo = bar; }

}

My comment here describes roughly the approach I used to take advantage of stack-allocated memory (before Span<T> was available). https://news.ycombinator.com/item?id=15136627

link

eni 3065 days ago

Thanks! Your example is pretty interesting. Any reason why this is the case? In both cases, it is just accessing a memory location to read the value. Are there compiler optimization heuristics at play here? E.g., for the local variable compiler knows that its value is not changing during the loop execution, so it can be pushed to register for faster access.

link

celdon25 3065 days ago

Register access isn't the issue. In the first example, this.Width and this.Height are accessing the Width and Height property of the current object. This requires a heap fetch on each iteration of the loop. There may be OS-specific nuances with automatic caching that I can't remember clearly enough to reliably mention.

If you can get rid of all heap lookups in your iterative loop, then you'll see a large speed boost if that was the bottleneck. Local variables exist on the stack, which tends to exist in the CPU cache when the current thread is active. https://msdn.microsoft.com/en-us/library/windows/desktop/ms6...

Unfortunately, method calls in C# have a much higher overhead than in C and C++. If you must do a method call in your loop, be sure to read this to see if your method can be inlined. Only very small methods of 32 IL bytes or less can be inlined: https://stackoverflow.com/questions/473782/inline-functions-...

link

eni 3065 days ago

Thanks!

link

celdon25 3065 days ago

Also be sure to check this out https://gist.github.com/jboner/2841832 Notably the L1/L2 cache vs the main memory reference

link