Hacker News new | ask | show | jobs
by Altern4tiveAcc 8 days ago
> SoA is weak if you are adding/removing monsters more often than accessing a single "hot" field.

Why is that? Genuinely curious. Does "weak" mean that it performs worse than AoS, or that the gains aren't as significant versus AoS?

2 comments

It's because removing a monster with 20 fields from an SoA structure means resizing 20 arrays. Removing the same monster from an AoS array involves resizing a single array, which you're going to process in a very cache friendly way.
I'm not sure why anybody would at the same time be implementing SoA AND resizing 20 arrays for a single delete, those things seem to be on either ends of the "I care about performance" spectrum.
The point is that a simple SoA implementation requires this - each field in the monster struct is an item in 20 different arrays. So, removing one monster means removing that item from those 20 arrays.

Now, as others have suggested, you can have a more complex implementation, where instead of removing the monster's fields from those arrays, you just mark them as "dead" or whatever and then skip them when consuming the relevant arrays, with some relatively small extra bookkeeping overhead. Of course, this comes with its own drawbacks, especially if the number of monsters is very dynamic and you are memory constrained.

The point is not to say that SoA is never good for performance, it obviously and certainly is, probably even in most cases. It's just not always best for performance, this was all.

> So, removing one monster means removing that item from those 20 arrays.

Removing from an array is not the same as resizing, which is what I commented on. Resizing is a very deliberate, bad, choice.

If you need to support deletes, you can do this without resizing an array. Either by tracking object lifetimes and inserting tombstones, or by swapping to fill in deleted objects. Both of them retain good performance characteristics. Both of them are easy.

This is not "simple vs complex" this is "I misunderstand vs understand SoA"

Assuming ordering isn't a concern, can't you just have a field called "removed" and skip those when iterating?

Or swap it with the last monster, and keeping an index for the last monster alive.

Sure, but these schemes might have their own drawbacks depending on the exact use case - especially if you have a very dynamic number of monsters and constantly add and remove them (say, some kind of bullet hell style game).
Then you have to read the "removed" field on every field read on every operation.

SoA is only useful when you don't read multiple fields for most operations.

Two fields should be fine, actually. The way caches are organized you are very unlikely to thrash with the lookups (due to n-way associativity) while only keeping relevant data in the cache at the same time. You still have roughly the following layout (in the cache), where A is the field and V is valid:

  | A1 A2 A3 A4 | A5 A6 A7 A8 | ...
  | V1 V2 V3 V4 | V5 V6 V7 V8 | ...
The former access pattern still yields a clean cache layout where no unnecessary data is loaded (which is the most costly operation here by far) as opposed to

  | A1 V1 B1 C1 | ... | A2 V2 B2 C2  | ...
In the general case there will exist a number of fields for which SOA layout will be worse if all are accessed close to each other, but for just a validity indicator this should not be the case. I think your statement is not wrong, but also not 100% correct.

This is on par to linear search being faster than binary search for small n. As soon as caches and branch prediction chime in many rules of thumb just change. Most importantly, however, is that a distinction between small and large n basically _needs_ to happen at that point.

Presumably they're referring to resizing the arrays.
Array resizing is avoidable with an embedded free list if ordering is of no concern.
If you take out ordering, then lookups on your SoA are now a search, and n-field lookup on an entity is now a JOIN operation.

The smarter you get about it, the closer you get to an OLAP db

Which leads to my theory… I feel like Bevy could be implemented on top of an in-memory DuckDB and get away with it

Depending on your access patterns, maybe you could have a hash table mapping entities ids to indexes in your SoA. Perhaps that's viable if looking up a single entity is not typical to your use case?

> Which leads to my theory… I feel like Bevy could be implemented on top of an in-memory DuckDB and get away with it

Haha, it certainly does sound viable.