Hacker News new | ask | show | jobs
by half-kh-hacker 1084 days ago
I'm excited for Project Valhalla to affect Minecraft performance; a lot of the temporary allocations are things like BlockPos objects (a Vec3i, basically) and VoxelShapes (a tree of axis aligned bounding boxes; so, like, an array of a struct that's six f64s) which seem ripe for becoming value objects that are inlined by HotSpot instead of living on the heap
1 comments

Why not pool these objects? I'm primarily an Android developer and it's a well-known easy optimization on Android to avoid short-lived objects as much as possible, but especially during drawing and other actions that run every frame. You just don't use the word "new" in onDraw and other related methods. Android Studio would even warn you if you do.

But then the new APIs in JDK itself are designed such that you have to allocate loads of short-lived small objects. I was told that HotSpot does deal with them reasonably well to avoid them degrading the performance, but apparently it isn't very good at it?

Allocation is typically really cheap, maintaining pools for objects would likely have more overhead. And while collecting garbage takes resources too, it's heavily concurrently, especially in GCs like Shenandoah and ZGC. So instead of more overhead due to pooling on the thread that uses the objects, you have more overhead on a different thread during garbage collection.

So while it makes sense to avoid unnecessary allocations by using different APIs (e.g. not creating Streams in hot paths), pooling brings far more new problems with it. It might make sense for large objects, but generally requires in-depth analysis to make sure it actually helps.

Also, when plugins come into the equation, you need to make sure that those can't modify objects they aren't meant to modify, which involves copying of objects. Additionally, some objects have different representations in the API (what's used by plugins) vs the implementation (what's used by vanilla minecraft), so converting between those representations is another source of allocations.

Pooling would have been more overhead than allocation + deallocation? Do you have any relevant readings?

No idea how to do it in java but a few pointers ought to be enough. You can also omit clearing the memory area between allocations for things that aren't security sensitive, if that is done in java, which I would assume.

Allocation + deallocation might have more overhead together. I'll try to rephrase: Requesting an object from a pool might have more overhead than allocating a fresh object, and deallocating the object doesn't happen on an application thread but on a GC thread (depending on the GC, obviously). Alexksey Shipilev has good resources how GCs in HotSpot typically allocate objects (https://shipilev.net/jvm/anatomy-quarks/4-tlab-allocation/).

There definitely are scenarios where pooling might make sense, but basically the low hanging fruits in that area in minecraft are already reaped.

I got that, and thanks for the link. But I'm not convinced a pool would have more overhead. It is basically the same thing (not really, but then again, pretty much) just you can encode more information about the usage than a general GC can possible do.

For sure there are other tradeoffs, and whether it is worth it. But when we actually see many GB/s that is not cheap even if you are able to offload to other cores.

The article also concludes:

>It is funny to consider that having TLABs is the way to experience more frequent GC pauses, just because the allocation is so damn cheap!

Sounds like a nightmare, the problem just snowballs, because now you'll be tempted to get into GC tuning.

Seems like there should be a performance benefit from not constantly blowing out your L2 cache. If you can keep your object pool hot there should be quite a bit of performance to be gained. The downside is that this would require active memory management (explicitly freeing the objects when you're done with them) and if you have that why bother programming in a GC language in the first place?
For short lived objects, heap allocation is probably about as fast as allocating on the stack. By pooling you can end up moving objects out of the fast eden space into older generations.

I worked on optimising a java library a few years back and one of the bigger speed ups was removing all the object pooling code.

But that means that the pools still use the GC? If so then I fully agree. But the point of a pool in my mind would be to reuse the memory rather than allocate+deallocate it.
Agreed. Allocation is pretty cheap, but it can still be slowed by an order of magnitude by two things:

1. by using finalizers

2. by using object initialization blocks

If you avoid these two problems you get nano-second performance, because the JVM does not need to run code during object creation and GC.