Hacker News new | ask | show | jobs
by imtringued 3021 days ago
>Am I correct in thinking that this would be pretty okay for most games, AAA or not?

No. Soft real time just means that the GC will not pause the process and and it will not cause stuttering during garbage collection pauses. In languages with GCs this is usually achieved by not allocating too many objects in your game loop.

If you write the 3D engine in erlang your game won't stutter but it will have an incredibly low framerate.

The secret ingredient to high performance is primarily data locality and obviously a compiler/runtime that can actually translate the program into efficient code. You can make your compiler as good as you want, without data locality and control over the memory layout the performance simply won't be good enough.

What do I mean by data locality? Primarily these things.

Avoid indirection through pointers.

Java's ArrayList is a nice example. You cannot store raw "int"s in an ArrayList. Only "Integer" objects. This means if you want to calculate the sum of the ArrayList the CPU will have to read the data from main memory if it is not in the CPU cache. This is roughly two orders of magnitude more expensive than an a single addition for example. If data can not be found in the cache it is called a "cache miss". Of course in python, javascript, erlang almost everything is a pointer that points.

Continguous storage of data.

Java still have primitive arrays. You can declare an int[] array to avoid the above problem. This means the integers are stored directly inside the array. Well what if the array is not in the cache? Wouldn't that incurr the same problem as a above?

A CPU is actually quite smart. It has a prefetching unit that can detect if you're loading data in a regular pattern and load the next piece of data according to that pattern.

If you're the CPU and see this pattern. What would you do to avoid a cache miss?

arr[0] arr[1] arr[2] arr[3] arr[x]

Of course you would start loading arr[4], arr[5], arr[6], arr[7] ahead of time!

Efficient storage

ArrayList stores a continguous list of pointers. On a 64bit architecture this means every pointer costs you 8 byte of memory. On top of that every object in Java has a "header". I don't know the exact number but let's say it's size is 8 bytes. Then there are alignment restrictions. Let's say each object is aligned by 8 bytes. 8+8+4 is 20. The nearest multiple of 8 is 24. We're storing a 4 byte number in 24 byte worth of memory. In other words if we are memory bandwidth constrained we can increase our performance by another 6x just by reducing the amount of "useless" data we have to read.

Mutation

There are efficient immutable datastructures but these usually involve indirection through pointers. Avoiding allocation of new memory avoids GC pauses or time spent in malloc/free. Mutating only a small part of a datastructure is more efficient than creating a copy.

Stack allocation

The primary benefit is that the top of the stack is usually in the L1 cache. Managing a stack is more efficient and a GC or manual memory allocation. It's just a pointer that gets incremented or decremented.

Erlang doesn't give you control over either of these. Ponylang has actors with isolated GC heaps and gives you control over memory layout and data locality.

1 comments

I couldn't have hoped for a better answer. Thanks!