| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Altern4tiveAcc 8 days ago
	> SoA is weak if you are adding/removing monsters more often than accessing a single "hot" field. Why is that? Genuinely curious. Does "weak" mean that it performs worse than AoS, or that the gains aren't as significant versus AoS?

2 comments

tsimionescu 8 days ago

It's because removing a monster with 20 fields from an SoA structure means resizing 20 arrays. Removing the same monster from an AoS array involves resizing a single array, which you're going to process in a very cache friendly way.

link

vouwfietsman 8 days ago

I'm not sure why anybody would at the same time be implementing SoA AND resizing 20 arrays for a single delete, those things seem to be on either ends of the "I care about performance" spectrum.

link

tsimionescu 8 days ago

The point is that a simple SoA implementation requires this - each field in the monster struct is an item in 20 different arrays. So, removing one monster means removing that item from those 20 arrays.

Now, as others have suggested, you can have a more complex implementation, where instead of removing the monster's fields from those arrays, you just mark them as "dead" or whatever and then skip them when consuming the relevant arrays, with some relatively small extra bookkeeping overhead. Of course, this comes with its own drawbacks, especially if the number of monsters is very dynamic and you are memory constrained.

The point is not to say that SoA is never good for performance, it obviously and certainly is, probably even in most cases. It's just not always best for performance, this was all.

link

vouwfietsman 5 days ago

> So, removing one monster means removing that item from those 20 arrays.

Removing from an array is not the same as resizing, which is what I commented on. Resizing is a very deliberate, bad, choice.

If you need to support deletes, you can do this without resizing an array. Either by tracking object lifetimes and inserting tombstones, or by swapping to fill in deleted objects. Both of them retain good performance characteristics. Both of them are easy.

This is not "simple vs complex" this is "I misunderstand vs understand SoA"

link

Altern4tiveAcc 8 days ago

Assuming ordering isn't a concern, can't you just have a field called "removed" and skip those when iterating?

Or swap it with the last monster, and keeping an index for the last monster alive.

link

tsimionescu 8 days ago

Sure, but these schemes might have their own drawbacks depending on the exact use case - especially if you have a very dynamic number of monsters and constantly add and remove them (say, some kind of bullet hell style game).

link

marcosdumay 8 days ago

Then you have to read the "removed" field on every field read on every operation.

SoA is only useful when you don't read multiple fields for most operations.

link

ablob 8 days ago

Two fields should be fine, actually. The way caches are organized you are very unlikely to thrash with the lookups (due to n-way associativity) while only keeping relevant data in the cache at the same time. You still have roughly the following layout (in the cache), where A is the field and V is valid:

  | A1 A2 A3 A4 | A5 A6 A7 A8 | ...
  | V1 V2 V3 V4 | V5 V6 V7 V8 | ...

The former access pattern still yields a clean cache layout where no unnecessary data is loaded (which is the most costly operation here by far) as opposed to

  | A1 V1 B1 C1 | ... | A2 V2 B2 C2  | ...

In the general case there will exist a number of fields for which SOA layout will be worse if all are accessed close to each other, but for just a validity indicator this should not be the case. I think your statement is not wrong, but also not 100% correct.

This is on par to linear search being faster than binary search for small n. As soon as caches and branch prediction chime in many rules of thumb just change. Most importantly, however, is that a distinction between small and large n basically _needs_ to happen at that point.

link

jayd16 8 days ago

Presumably they're referring to resizing the arrays.

link

gmueckl 8 days ago

Array resizing is avoidable with an embedded free list if ordering is of no concern.

link

setr 7 days ago

If you take out ordering, then lookups on your SoA are now a search, and n-field lookup on an entity is now a JOIN operation.

The smarter you get about it, the closer you get to an OLAP db

Which leads to my theory… I feel like Bevy could be implemented on top of an in-memory DuckDB and get away with it

link

Altern4tiveAcc 7 days ago

Depending on your access patterns, maybe you could have a hash table mapping entities ids to indexes in your SoA. Perhaps that's viable if looking up a single entity is not typical to your use case?

> Which leads to my theory… I feel like Bevy could be implemented on top of an in-memory DuckDB and get away with it

Haha, it certainly does sound viable.

link